Determining causal models for controlling environments

ABSTRACT

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining causal models for controlling environments. One of the methods includes repeatedly selecting, by a control system for the environment, control settings for the environment based on internal parameters of the control system, wherein: at least some of the control settings for the environment are selected based on a causal model, and the internal parameters include a first set of internal parameters that define a number of previously received performance metric values that are used to generate the causal model for a particular controllable element; obtaining, for each selected control setting, a performance metric value; determining that generating the causal model for the particular controllable element would result in higher system performance; and adjusting, based on the determining, the first set of internal parameters.

BACKGROUND

This specification relates to controlling an environment and todetermining causal relationships between control settings forcontrolling the environment and environment responses received from theenvironment.

Existing techniques for determining which control settings should beused to control an environment generally employ either modeling-basedtechniques or rely on active control of the system.

In modeling-based techniques, the system passively observes data, i.e.,historical mappings of control settings to environment responses, andattempts to discover patterns in the data to learning a model that canbe used to control the environment. Examples of modeling-basedtechniques include decision forests, logistic regression, support vectormachines, neural networks, kernel machines and Bayesian classifiers.

In active control techniques, the system relies on active control of theenvironment for knowledge generation and application. Examples of activecontrol techniques include randomized controlled experimentation, e.g.,bandit experiments.

SUMMARY

This specification describes methods and systems for controlling anenvironment. In one aspect, a method includes repeatedly selecting, by acontrol system for the environment, control settings for the environmentbased on internal parameters of the control system, wherein: at leastsome of the control settings for the environment are selected based on acausal model that identifies, for each controllable element, causalrelationships between possible settings for the controllable element anda performance metric that measures a performance of the control systemin controlling the environment, and the internal parameters include afirst set of internal parameters that define a number of previouslyreceived performance metric values that are used to generate the causalmodel for a particular one of the controllable elements; obtaining, foreach selected control setting, a performance metric value; determining,based on the performance metric values, a likelihood that generating thecausal model for the particular controllable element would result inhigher system performance; and determining, based on the likelihood,whether to adjust the first set of internal parameters.

Particular embodiments of the subject matter described in thisspecification can be implemented so as to realize one or more of thefollowing advantages.

The control systems described in this specification can automaticallygenerate causal knowledge (in the form of a causal model) with theprecision of controlled experimentation in a way that addresses manycurrent limitations of conventional approaches, particularly whenapplied to dynamic systems. The described techniques deliver real-timeunderstanding and quantification of causation while providing fullyautomated operational control and seamlessly integrated multi-objectiveoptimization. The emergent behavior of this architecture is rational,robust and scalable, and provides strikingly fast learning andoptimization applicable to complex and critical real-world systems, eventhose subject to rapidly changing direction, magnitude, andspatial-temporal extent of the relationships among variables, whetherthose variables are under the control of the system or not. Therefore,the described system can more efficiently control the environment, i.e.,achieve better system performance according the performance metric,while using fewer computational resources and less data thanconventional techniques. Moreover, the system can more quickly respondto changes of relationships among variables, reducing the amount of timethat the environment is being controlled sub-optimally, therebymitigating negative consequences related to selecting sub-optimalsettings. Moreover, the system can accomplish this while selectingcontrol settings that are within acceptable or historical ranges,ensuring that system does not deviate from safe ranges for the controlsettings.

In particular, the described techniques place some or all of theinternal parameters of the system under recursive experimental control,i.e., adjust the values of the internal parameters during operation ofthe system, continuously adjusting the characteristics of the proceduralinstances to self-correct any false assumption or a priori bias, anddynamically optimizing control decisions relative to performancemetrics, constraints and precision vs. granularity of causal knowledge.This results in techniques that are robust to all properties ofstatistical distributions and are epistemically efficient at exploringand exploiting search spaces including spatial-temporal effects,automatically adjusting the sampling and use of acquired data based onunbiased measurements of the degree to which the relationships amongvariables are changing across space and time and thus the extent towhich data are representative of the current state of the world forreal-time decision support. As a particular example, the system canrepeatedly adjust the values of one or more of the internal parametersby monitoring a difference between the current system performance andbaseline system performance, i.e., the performance of the system whencontrolling the environment using baseline probability distributionsover possible control settings. The system can use changes in thisdifference and relative effects between different possible values of theinternal parameters on this difference to ensure that the internalparameters have values that ensure effective system performancethroughout operation, even as properties of the environment change andpreviously gathered data becomes less relevant.

In other words, unlike conventional systems, the described controlsystem can very quickly adjust to changes in the relative causal effectsbetween different control settings. Additionally, the described systemdoes not require a priori knowledge of the effectiveness of anyparticular control setting and, in fact, can adjust for incorrect apriori knowledge that is provided as a baseline to the system. That is,when properties of the environment change, the system can detect thechange and adjust the internal parameters to minimize the effect of thechange on the effectiveness of the system's control of the environment.

The details of one or more embodiments of the subject matter describedin this specification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawings,and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows a control system that selects control settings that areapplied to an environment.

FIG. 1B shows data from an example causal model.

FIG. 2 is a flow diagram of an example process for controlling anenvironment.

FIG. 3 is a flow diagram of an example process for performing aniteration of environment control.

FIG. 4A is a flow diagram of an example process for determiningprocedural instances.

FIG. 4B shows an example of an environment that includes multiplephysical entities that are each associated with a spatial extent.

FIG. 5 is a flow diagram of an example process for selecting controlsettings for the current set of instances.

FIG. 6 is a flow diagram of an example process for updating the causalmodel for a given controllable element and a given type of environmentresponse.

FIG. 7 is a flow diagram of an example process for clustering a set ofprocedural instances for a given controllable element.

FIG. 8 is a flow diagram of an example process an example process forupdating a set of internal parameters using stochastic variation.

FIG. 9 is a flow diagram of an example process for updating the value ofa data inclusion value for a given controllable element based onheuristics.

FIG. 10 is a flow diagram of an example process for responding to achange in one or more properties of the environment.

FIG. 11 shows a representation of the data inclusion window for a givencontrollable element of the environment when the set of internalparameters that define the data inclusion are stochastically varied.

FIG. 12 shows the performance of the described system when controllingan environment relative to the performance of systems that control thesame environment using existing control schemes.

FIG. 13 shows the performance of the described system relative to theperformance of multiple other systems when controlling multipledifferent environments.

FIG. 14 shows the performance of the described system relative to theperformance of multiple other systems when controlling multipledifferent environments that have varied temporal effects.

FIG. 15 shows the performance of the described system with and withoutclustering.

FIG. 16 shows the performance of the described system with the abilityto vary the data inclusion relative to the performance of the describedsystem controlling the same environment while holding the data inclusionwindow parameters fixed.

FIG. 17 shows the performance of the described system with and withouttemporal analysis, i.e., with the ability to vary the temporal extentand without.

FIG. 18 shows the performance of the described system when controllingan environment relative to the performance of a system that controls thesame environment using an existing control scheme.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

This specification generally describes a control system that controls anenvironment as the environment changes states. In particular, the systemcontrols the environment in order to determine causal relationshipsbetween control settings for the environment and environment responsesto the control settings.

For example, the environment responses for which causal relationshipsare being determined can include (i) sensor readings or otherenvironment measurements that reflect the state of the environment, (ii)a performance metric, e.g., a figure of merit or an objective function,that measures the performance of the control system based on environmentmeasurements, or (iii) both.

In particular, the control system repeatedly selects control settingsthat each include respective settings for each of a set of controllableelements of the environment. Generally, the selection of differentcontrol settings results in differences in system performance, i.e., indifferent values of the environment responses.

More specifically, by repeatedly selecting control settings andmeasuring the impact of the control settings on the environment, thecontrol system updates a causal model that models the causalrelationships between control settings and the environment responses,i.e., updates maintained data that identifies causal relationshipsbetween control settings and system performance.

While the causal model is referred to as a “causal model,” the modelcan, in some implementations, be made up of multiple causal models thateach correspond to different segments of the environment, i.e., tosegments of the environment that share certain characteristics.

In some implementations, the control system can continue to operate anduse the causal model to select control settings. In otherimplementations, once certain criteria are satisfied, the control systemcan provide the causal model to an external system or can provide datadisplaying the causal relationships identified in the causal model to auser for use in controlling the environment. For example, the criteriacan be satisfied after the system has controlled the environment for acertain amount of time or has selected settings a certain number oftimes. As another example, the criteria can be satisfied when the causalrelationships identified in the maintained data satisfy certaincriteria, e.g., have confidence intervals that do not overlap.

While updating the causal model, the system repeatedly selects differentcontrol settings and measures the impact of each possible controlsetting on environment responses based on internal parameters of thecontrol system and on characteristics of the environment.

In other words, the internal parameters of the control system defineboth (i) how the system updates the causal model and (ii) how the systemdetermines which control settings to select given the current causalmodel. While updating the causal model, the control system alsorepeatedly adjusts at least some of the internal parameters as moreenvironment responses become available to assist in identifying causalrelationships.

FIG. 1A shows a control system 100 that selects control settings 104that are applied to an environment 102. Each control setting 104 definesa setting for each of multiple controllable elements of the environment102. Generally, the controllable elements of the environment are thoseelements that can be controlled by the system 100 and that can takemultiple different possible settings. During operation, the controlsystem 100 repeatedly selects control settings 104 and monitorsenvironment responses 130 to the control settings 104. The system 100also monitors the characteristics 140 of the environment 102. Generally,the characteristics 140 can include any data characterizing theenvironment that may modify the effect that control settings 104 have onenvironment responses 130 but that are not accounted for in the controlsettings, i.e., that are not controllable by the control system 100. Thesystem 100 uses the environment responses 130 to update a causal model110 that models causal relationships between control settings and theenvironment responses, i.e., that models how different settings fordifferent elements affect values of the environment responses.

In particular, the causal model 110 measures, for each controllableelement of the environment and for each different type of environmentresponse, the causal effects of the different possible settings for thecontrollable element on the environment response and the current levelof uncertainty of the system about the causal effects of the possiblesettings.

As a particular example, the causal model 110 can include, for eachdifferent possible setting of a given controllable element and for eachdifferent type of environment response, an impact measurement thatrepresents the impact of the possible setting on the environmentresponse relative to the other possible settings for the controllableelement, e.g., a mean estimate of the true mean effect of the possiblesetting, and a confidence interval, e.g., a 95% confidence interval, forthe impact measurement that represents the current level of systemuncertainty about the causal effects.

Prior to beginning to control the environment 102, the control system100 receives external inputs 106. The external inputs 106 can includedata received by the control system 100 from any of a variety ofsources. For example, the external inputs 106 can include data receivedfrom a user of the system, data generated by another control system thatwas previously controlling the environment 102, data generated by amachine learning model, or some combination of these.

Generally, the external inputs 106 specify at least (i) initial possiblevalues for the settings of the controllable elements of the environment102 and (ii) which environment responses the control system 100 tracksduring operation.

For example, the external inputs 106 can specify that the control system100 needs to track measurements for certain sensors of the environment,a performance metric, i.e., a figure of merit or other objectivefunction that is derived from certain sensor measurements, to beoptimized by the system 100 while controlling the environment, or both.

The control system 100 uses the external inputs 106 to generate initialprobability distributions (“baseline probability distributions”) overthe initial possible setting values for the controllable elements. Byinitializing these baseline probability distributions using externalinputs 106, the system 100 ensures that settings are selected that donot violate any constraints imposed by the external data 106 and, ifdesired by a user of the system 100, do not deviate from historicalranges for the control settings that have already been used to controlthe environment 102.

The control system 100 also uses the external inputs 106 to initialize aset of internal parameters 120, i.e., to assign baseline values to theset of internal parameters. Generally, the internal parameters 120define how the system 100 selects control settings given the currentcausal model 110, i.e., given the current causal relationships that havebeen determined by the system 100 and the system uncertainty about thecurrent causal relationships. The internal parameters 120 also definehow the system 100 updates the causal model 110 using receivedenvironment responses 130.

As will be described in more detail below, the system 100 updates atleast some of the internal parameters 120 while updating the causalmodel 110. That is, while some of the internal parameters 120 may befixed to the initialized, baseline values during operation of the system100, the system 100 repeatedly adjusts others of the internal parameters120 during operation in order to allow the system to more effectivelymeasure and, in some cases, exploit causal relationships.

In particular, in order to control the environment, during operation,the system 100 repeatedly identifies procedural instances within theenvironment based on the internal parameters 120.

Each procedural instance is a collection of one or more entities withinthe environment that are associated with a time window. An entity withinthe environment is a subset, i.e., either a proper subset or an impropersubset, of the environment. In particular, an entity is a subset of theenvironment for which environment responses can be obtained and whichcan be impacted by applied control settings.

For example, when the environment includes multiple physical entitiesfrom which sensor measurements can be obtained, a given proceduralinstance can include a proper subset of the physical entities to which aset of control settings will be applied. The number of subsets intowhich the entities within the environment can be divided is defined bythe internal parameters 120.

In particular, how the system 100 divides the entities into subsets atany given time during operation of the system is defined by internalparameters that define the spatial extent of the control settingsapplied by the system for the instance. The spatial extent of aninstance identifies the subset of the environment that is assigned tothe instance, i.e., such that environment responses that are obtainedfrom that subset will be associated with the instance.

The length of the time window associated with the entities in any givenprocedural instance is also defined by the internal parameters 120. Inparticular, the time window that the system assigns to any givenprocedural instance is defined by internal parameters that define thetemporal extent of the control settings applied by the system. This timewindow, i.e., the temporal extent of the instance, defines which futureenvironment responses the system 100 will determine were caused bycontrol settings that were selected for the procedural instance.

While each procedural instance is described as having a single spatialextent and temporal extent for ease of introduction, differentcontrollable elements within a procedural instance can have differentspatial extents, different temporal extents, or both. This will bedescribed in more detail below.

Because the internal parameters 120 change during operation of thesystem 100, the instances generated by the system 100 may also change.That is, the system can modify how the procedural instances areidentified as the as the system changes the internal parameters 120.

The system 100 then selects settings for each instance based on theinternal parameters 120 and, in some cases, on the environmentcharacteristics 140.

In some cases, i.e., when the system 100 is not using the causal modelthat has been determined by the system, the system 100 selects thesettings for all of the instances based on the baseline probabilitydistributions.

In other cases, i.e., when the system 100 is exploiting the causalrelationships that have already been determined to optimize an objectivefunction, the system 100 selects the settings for some of the instances(“hybrid instances”) using the current causal model 110 while continuingto select the settings for others of the instances (“baselineinstances”) based on the baseline probability distributions. Morespecifically, at any given time during operation of the system 100, theinternal parameters 120 define the proportion of hybrid instancesrelative to the total number of instances.

The system 100 also determines, for each instance, which environmentresponses 130 will be associated with the instance, i.e., for use inupdating the causal model 110, based on the internal parameters 120.

The system 100 then sets the settings 104 for each of the instances andmonitors environment responses 130 to the settings that are selected forthe instances. The system 100 maps the environment responses 130 toimpact measurements for each instance and uses the impact measurementsto determine causal model updates 150 that are used to update thecurrent causal model 110.

In particular, the system determines, based on the internal parameters120, which historical procedural instances (and the environmentresponses 130 associated with the instances) should be considered by thecausal model 110, and determines the causal model updates 150 based onlyon these determined historical procedural instances. Which historicalprocedural instances are considered by the causal model 110 isdetermined by a set of internal parameters 120 that define a datainclusion window for each controllable element. The data inclusionwindow for a given controllable element specifies, at any given time,one or more historical time windows during which a procedural instancemust have occurred in order for the results for that proceduralinstance, i.e., the environment responses 130 associated with thatprocedural instance, to be considered by the causal model 110 whendetermining causal relationships between the different settings for thecontrollable element and the environment responses.

For those internal parameters that are being varied by the system 100,the system 100 periodically also updates 160 the data that is maintainedby the system 100 for those internal parameters based on the causalmodel 110 or on other results of controlling the environment 100. Inother words, as the causal model 110 changes during operation of thesystem 100, the system 100 also updates the internal parameters 120 toreflect the changes in the causal model 110. In cases where the system100 assigns some control settings to exploit the current causal model110, the system 100 can also use the difference between systemperformance for “hybrid” instances and “baseline” instances to determinethe internal parameter updates 160.

The system 100 can be used to control any of a variety of environmentsand, in particular, can be used to generate and exploit a causal model110 that measures relationships between control settings forcontrollable elements of the environment and environment responses 130.

For example, the described system can be used for manufacturing processcontrol, power grid control, battery management control, displaycontrol, sensor applications, semiconductor polishing control,facilities control, individual medicine, biological manufacturingcontrol, facility operations control, advertising campaign control,e-commerce campaign control, and so on. For example, in a displayapplication, the control settings may be signals for controlling pixeldriving schemes, and the environment responses may include pixellifetime. As another example, in a sensor application, the controlsettings may control calibration and/or interpretation of sensors in asensor network, and the environment responses may include a measure ofaccuracy of results from the network. The methods described herein canbe applied to a wide variety of manufacturing processes, for example,including those described in U.S. Pat. Appl. Pub. Nos. 2001/0047846(Currens et al.), 2004/0099993 (Jackson et al.), 2006/0274244 (Battiatoet al.), 2010/0103521 (Smith et al.), 2010/0201242 (Liu et al.),2013/0038939 (Walker, Jr. et al.), and in U.S. Pat. Nos. 5,882,774(Jonza et al.) and 5,976,424 (Weber et al.), for example. Each of theseis hereby incorporated by reference herein in their entirety. As oneillustrative manufacturing process example, the methods can be used todetermine optimum control settings in a process for manufacturing film,such as brightness enhancement film. The control settings may controlone or more of line speed, oven temperature, or viscosity (e.g., throughmaterial composition), for example, and the measurable outcomes mayinclude one or more of an on-axis brightness gain provided by thebrightness enhancement film, a defect density, and a prism pitch.

FIG. 1B shows data from an example causal model. In particular, in theexample of FIG. 1B, the causal model is represented as a chart 180 thatshows control settings, i.e., different possible settings for differentcontrollable elements, on the x axis and causal effects for the controlsettings on the y axis. In particular, for each possible setting of eachcontrollable element, the causal model depicts an impact measurement andconfidence interval around that impact measurement.

These causal relationships are shown in more detail for a particularcontrollable element 192 in an element-specific chart 190. Theelement-specific chart 190 shows that there are five possible settingsfor the controllable element 192, with possible settings being referredto as levels in the chart 190. For each of the five settings, the chartincludes a bar that represents the impact measurement and arepresentation of the confidence interval around the bar as error barsaround the impact measurement. Thus, the information in the causal modelfor any given setting for a controllable element includes an impactmeasurement and a confidence interval around that impact measurement.For example, for the second setting 192 (denoted as IV-LV2 in theFigure), the chart 190 shows a column 194 that indicates the impactmeasurement for the second setting, an upper bar 196 above the top ofthe column 194 that shows the upper limit of the confidence interval forthe second setting, and a lower bar 198 below the top of the column 194that shows the lower limit of the confidence interval for the secondsetting.

While FIG. 1B shows a single causal model, it will be understood fromthe description below that the system can maintain and update multipledifferent causal models for any given controllable element—one for eachcluster of procedural instances.

FIG. 2 is a flow diagram of an example process 200 for controlling anenvironment. For convenience, the process 200 will be described as beingperformed by a system of one or more computers located in one or morelocations. For example, a control system, e.g., the control system 100of FIG. 1, appropriately programmed, can perform the process 200.

The system assigns baseline values to a set of internal parameters andbaseline probability distributions to each of the controllable elementsof the environment (step 202).

In particular, the system receives external data, e.g., from a user ofthe system or data derived from previous control of the systemenvironment by another system, and then uses the external data to assignthe baseline values and to generate the probability distributions.Generally, the external data specifies the initial constraints that thesystem operates within when controlling the environment.

In particular, the external data identifies the possible controlsettings for each of the controllable elements in the environment. Thatis, the external data identifies, for each of the controllable elementsin the environment, which possible settings the system can select forthe controllable element when controlling the system.

In some cases, the external data can specify additional constraints forpossible control settings, e.g., that the settings for certaincontrollable elements are dependent on the settings for othercontrollable elements or that certain entities can only be associatedwith a certain subset of the possible control settings for a givencontrollable element.

Thus, the external data defines the search space of possiblecombinations of control settings that can be explored by the system whencontrolling the environment.

In some implementations, these constraints can change during operationof the system.

For example, the system can receive additional external inputs modifyingthe possible control settings for one or more of the controllableelements or the ranges of values for the spatial and temporal extents.

As another example, if the system determines that optimal settings forone of the controllable elements or for one of the internal parametersare approaching the boundary of the search space defined by externalconstraints, e.g., if the impact measurements in the causal modelindicate the most optimal settings are those closest to one of theboundaries of the search space, the system can seek authorization toexpand the space of possible values for the controllable element or theinternal parameter, e.g., from a system administrator or other user ofthe system.

As yet another example, if the external data specifies that the possiblesettings for some controllable element can be any value in a continuousrange, the system can initially discretize the range in one way andthen, once the confidence intervals indicate strongly enough that theoptimal values are in one segment of the continuous range, modify thediscretization to favor that segment.

As yet another example, if the system determines that a certaincontrollable element has no causal effect on the environment responses,e.g., if the impact measurements for all possible settings of thecontrollable element are all likely to be zero, the system can seekauthorization to remove the controllable element from being controlledby the system.

The system then generates a baseline (or “prior”) probabilitydistribution over the possible control settings for each of thecontrollable elements of the environment. For example, when the externaldata specifies only the possible values for a given controllable elementand does not assign priorities to any of the possible values, the systemcan generate a uniform probability distribution over the possible valuesthat assigns an equal probability to each possible value. As anotherexample, when the external data prioritizes certain settings over othersfor a given controllable element, e.g., based on historical results ofcontrolling the environment, the system can generate a probabilitydistribution that assigns higher probability to the prioritizedsettings.

The system also assigns baseline values to each of the internalparameters of the system. In particular, the internal parameters of thesystem include (i) a set of internal parameters that define the spatialextents of the procedural instances generated by the system (referred toas “spatial extent parameters”) and (ii) a set of internal parametersthat define the temporal extents of the procedural instances generatedby the system (referred to as “spatial extent parameters”).

In some cases where the system includes multiple entities, the systemcan maintain separate sets of spatial extent parameters and temporalextent parameters for each of the multiple entities. In other caseswhere the system includes multiple entities, the system maintains only asingle set of spatial and temporal extent parameters that apply to allof the multiple entities. In yet other cases where the system includesmultiple entities, the system initially maintains a single set ofspatial and temporal extent parameters and, during operation of thesystem, can switch to maintaining a separate set of spatial extent ortemporal extent parameters if doing so results in improved systemperformance, i.e., if different entities respond to control settingsdifferently from other entities.

Additionally, in some implementations, the system maintains separatesets of temporal extent parameters for different controllable elements.

The system also maintains (iii) a set of internal parameters that definethe data inclusion window used by the system (referred to as “datainclusion window parameters”). In some implementations, the systemmaintains a single set of data inclusion window parameters that appliesto all of the controllable elements. In some other implementations, thesystem maintains a separate set of data inclusion window parameters foreach controllable element of the environment, i.e., to allow the systemto use different data inclusion windows for different controllableelements when updating the causal model. As will be described in moredetail below, in cases where the system has clustered the proceduralinstances into more than one cluster, the system can either (a) maintaina separate set of data inclusion window parameters per cluster or (b)maintain a separate set of data inclusion window parameters per clusterand per controllable element, i.e., so that different clusters can usedifferent data inclusion windows for the same controllable element.

In implementations where the system exploits the causal model, theinternal parameters also include (iv) a set of internal parameters thatdefine the hybrid instance to baseline instance ratio (referred to as“ratio parameters”). In some implementations, the system maintains asingle set of ratio parameters that applies to all of the controllableelements. In some other implementations, the system maintains a separateset of ratio parameters for each controllable element of theenvironment, i.e., to allow the system to use different ratios fordifferent controllable elements when selecting control settings. As willbe described in more detail below, in cases where the system hasclustered the procedural instances into more than one cluster, thesystem can either (a) continue to maintain a single set of ratioparameters across all of the clusters, (b) maintain a separate set ofratio parameters per cluster or (c) maintain a separate set of ratioparameters per cluster and per controllable element, i.e., so thatdifferent clusters can use different ratios when selecting controlsettings for the same controllable element.

In implementations where the system clusters the instances into multipleclusters, as will be described below, the internal parameters alsoinclude (v) a set of internal parameters that define the currentclustering strategy (referred to as “clustering parameters”).

Generally, the clustering parameters are or define the hyperparametersof the clustering technique that is used by the system. Examples of suchhyperparameters include the cluster size of each cluster, i.e., thenumber of procedural instances in each cluster, and the environmentalcharacteristics that are used to cluster the procedural instances.

The system maintains a set of clustering parameters for eachcontrollable element. That is, for each controllable element, the systemuses different hyperparameters when applying the clustering techniquesto generate clusters of procedural instances for that controllableelement.

The internal parameters can also optionally include any of a variety ofother internal parameters that impact the operation of the controlsystem. For example, the internal parameters may also include a set ofinternal parameters that define how to update the causal model (e.g. aset of weights, each representing the relative importance of eachenvironment characteristic during propensity matching between proceduralinstances, which can be used to compute d-scores as described below).

As described above, the system varies at least some of these internalparameters during operation.

For each of set of internal parameters that the system varies whilecontrolling the environment, the system can vary the values using a (i)heuristic-based approach, (ii) by stochastically sampling values tooptimize a figure of merit for the internal parameter, or (iii) both.

For any sets of internal parameters that are varied only based onheuristics, the system maintains a single value for the internalparameter and repeatedly adjusts that single value based on theheuristics.

For any sets of internal parameters that are varied by stochasticallysampling, the system maintains parameters that define a range ofpossible values for the internal parameter and maintains a causal modelthat identifies causal relationships between the possible values for theinternal parameter and a figure of merit for the internal parameter. Thefigure of merit for the internal parameter may be different from theperformance metric used in the causal model for the control settings.For at least some of the instances at any given time, the system thenselects values from within the range of possible values based on thecurrent causal model.

When the system updates a set of internal parameters using heuristics inaddition to through stochastic sampling, the system can update the rangeof possible values using the heuristics. That is, the range of possiblevalues is updated through the heuristic-based approach, while the causalmodel for the values within the range at any given time is updatedthrough stochastic sampling.

For any internal parameters that the system does not vary whilecontrolling the environment, the system can either maintain a fixedrange of values and a fixed probability distribution over the fixedrange of values or a fixed single value that is always the value usedduring the operation of the system.

Depending on what is included in the external data, the system assignsto each internal parameter a baseline value that is either derived fromthe external data or is a default value.

For example, the external data generally identifies a range of valuesfor the spatial and temporal extents. For example, when the spatialextent is not fixed and is an internal parameter that can be varied bythe system, the external data can specify a minimum and maximum valuefor the spatial extent. Similarly, when the temporal extent is not fixedand is an internal parameter that can be varied by the system, theexternal data can specify a minimum and maximum value for the temporalextent.

The system then uses the external data to assign the initial values tothe spatial extent parameters so that the parameters define the range ofvalues that is specified in the external data and assigns initial valuesto the temporal extent parameters so that the parameters define therange of values that is specified in the external data.

For other internal parameters, the system assigns default values. Forexample, the system can initialize the clustering parameters to indicatethat the number of clusters is 1, i.e., so that there is no clusteringin the outset of controlling the environment, and can initialize theratio parameters to indicate that there are no hybrid instances, i.e.,so that the system only explores at the outset of controlling theenvironment. The system can also initialize the data inclusion windowparameters to indicate that the data inclusion window includes allhistorical procedural instances that have been completed.

The system performs an initiation phase (step 204). During theinitiation phase, the system selects control settings for the proceduralinstances based on the baseline probability distributions for thecontrollable elements and uses the environment responses to update thecausal model. That is, so long as no historical causal model wasprovided as part of the external data, the system does not consider thecurrent causal model when determining which control settings to assignto the procedural instances.

Instead, the system selects control settings using the baselineprobability distribution in accordance with an assignment scheme thatallows impact measurements, i.e., d-scores, to later be computedeffectively. In other words, the assignment scheme selects controlsettings in a manner that accounts for the blocking scheme that is usedby the system to compute impact measurements, i.e., assigns controlsettings to different procedural instances that allow blocked groups tolater be identified in order to compute impact measurements between theblocked groups. The blocking scheme (and, accordingly, the assignmentscheme) employed by the system can be any of a variety of schemes thatreduce unexplained variability between different control settings.Examples of blocking schemes that can be employed by the system includeone or more of double-blind assignment, pair-wise assignment,latin-square assignment, propensity matching, and so on. Generally, thesystem can use any appropriate blocking scheme that assigns proceduralinstances to blocked groups based on the current environmentcharacteristics of the entities in the procedural instances.

When one or both of the spatial and temporal extent can be varied by thesystem, the system varies the spatial extent parameters, the temporalextent parameters, or both, during the initialization phase so thatvalues of the spatial and temporal extents that are more likely toresult in sufficiently orthogonal procedural instances are more likelyto be selected. A group of instances is considered to be orthogonal ifthe control settings applied to one of the instances in the group do notaffect the environment responses that are associated with any of theother instances in the group.

Selecting control settings and updating the causal model while in theinitialization phase is described in more detail below with reference toFIG. 3. Varying the spatial or temporal extent parameters is describedin more detail below with FIG. 11.

In some implementations, the system continues in this initializationphase throughout the operation of the system. That is, the systemcontinues to explore the space of possible control settings and compilesthe results of the exploration in the causal model.

As an example, the system can continue in this initialization phase whenthe system is updating a causal model with respect to multiple differentenvironment responses rather than with respect to a single figure ofmerit or objective function, i.e., when the system does not have afigure of merit or objective function to use when exploiting the causalmodel.

In some of these implementations, the system continues to explore thespace of possible control settings while also adjusting certain ones ofthe sets of initial parameters based on the causal model, e.g., thespatial extent parameters, the temporal extent parameters, the datainclusion window parameters, the clustering parameters, and so on.

In some other implementations, once the system determines that certaincriteria are satisfied, the system begins performing a different phase.In these implementations, during the initialization phase, the systemholds certain ones of the internal parameters fixed. For example, thesystem can hold the data inclusion window parameters fixed to indicatethat all historical instances should be incorporated in the causalmodel. As another example, the system can hold the clustering internalparameters fixed to indicate that no clustering should be performed.

In particular, in these other implementations, once the systemdetermines that the criteria are satisfied, the system can beginperforming an exploit phase (step 206).

For example, the system can begin performing the exploit phase once theamount of procedural instances for which environment responses have beencollected exceeds a threshold value. As a particular example, the systemmay determine that the threshold value is satisfied when the totalnumber of such procedural instances exceeds the threshold value. Asanother particular example, the system can determine that the thresholdvalue is satisfied when the minimum number of environment responsesassociated with any one possible setting for any controllable elementexceeds the threshold value.

Additionally, in some cases the system does not employ an initializationphase and immediately proceeds to the exploit phase, i.e., does notperform step 204.

When a threshold is used, the system can determine the threshold valuein any of a variety of ways.

As one example, the system can determine that the threshold value issatisfied when environment responses have been collected for enoughinstances such that assigning settings for instances based on the causalmodel results in different settings having different likelihoods ofbeing selected. How to assign likelihoods based on the causal model isdescribed in more detail below with reference to FIG. 5.

As another example, the system can determine the threshold value to bethe number of procedural instances that are required for the statisticaltest that the system performs to determine confidence intervals to yieldaccurate confidence intervals, i.e., the number of procedural instancesthat satisfies the statistical assumptions for the confidencecomputations.

As another example, the system can determine the threshold value to beequal to the number of procedural instances that are required to havethe causal model yield the desired statistical power, i.e., asdetermined by a power analysis.

During the exploit phase, the system selects control settings for someof the procedural instances based on the current causal model whilecontinuing to select control settings for other procedural instancesbased on the baseline values of the internal parameters.

In particular, the system varies the ratio internal parameters so thatthe ratio between how many procedural instances should be hybridinstances, i.e., instances for which the control settings are assignedbased on the causal model, and how many procedural instances should bebaseline instances, i.e., instances for which the control settings areassigned based on the baseline probability distributions, is greaterthan zero.

Because the system begins designating certain instances as hybridinstances during the exploit phase, the system can begin using thedifference in system performance between hybrid instances and exploreinstances to adjust values of the internal parameters, e.g., the ratiointernal parameters, the data inclusion window parameters, and so on.

Selecting control settings, updating the causal model, and updating theinternal parameters while in the exploit phase are described in moredetail below with reference to FIG. 3.

In some implementations, once the system determines that certaincriteria are satisfied, the system begins a clustering phase (step 208).That is, if the system is configured to cluster procedural instances,the system begins the clustering phase once the criteria for clusteringare satisfied. If the system is not configured to cluster instances, thesystem does not cluster procedural instances at any point duringoperation of the system.

Generally, the system considers clustering to create sub-populations ofsimilar procedural instances. In a real-world situation, differentprocedural instances across a population might respond differently todifferent control settings. The optimal control setting for oneprocedural instance might be suboptimal for another. These differencesmight affect the distributions of the performance metrics seen acrossinstances. If one control setting is selected for the entirety of thepopulation, a detrimental effect in the overall utility, i.e., theoverall performance of the system, may result. To maximize the overallutility across the entire population, the system can cluster theinstances into sub-populations, taking their individual characteristics(modelled in their environment characteristics) and their feedbackcharacteristics (modelled in the performance metrics received forcontrol settings) into account. The system selects control settings atthe level of these subpopulations.

Depending on the implementation and on the criteria, the system canbegin the clustering phase during the initialization phase or during theexploit phase. That is, despite FIG. 2 indicating that clustering isstep 208 while the initialization phase and the exploit phase are steps204 and 206, respectively, the clustering phase overlaps with theinitialization phase, the exploit phase, or both.

During the clustering phase, prior to assigning control settings toprocedural instances, the system clusters the procedural instances intoclusters based on current values of the clustering internal parametersand on the characteristics of the procedural instances. As describedabove, the clustering internal parameters for any given controllableelement define the hyperparameters of the clustering technique that willbe used to cluster for that controllable element.

Once the system has begun the clustering phase, at any given time, thesystem maintains a separate causal model for each cluster. That is, thesystem identifies separate causal relationships within each cluster. Asdescribed above, the system can also maintain separate sets of internalparameters for at least some of the internal parameters for eachcluster.

The below description will generally describe that separate sets ofratio parameters and data inclusion window parameters are maintained percluster and per controllable element. However, it should be understoodthat, when the system maintains only a single set of some type ofparameter per cluster, the described computations only need to beperformed once for each cluster and the result of the single computationcan be used for each controllable element for the cluster. Similarly,when the system maintains only a single set of some type of parameterfor all of the clusters, the described computations only need to beperformed once and the result of the single computation can be used forall of the controllable elements in all of the clusters.

During the exploit phase, once clustering has been initiated, within agiven cluster the system selects control settings for some of theprocedural instances in the cluster based on the current causal modelwhile continuing to select control settings for other proceduralinstances based on the baseline values of the internal parameters.

The system can employ any of a variety of criteria to determine when tobegin clustering, i.e., to determine when the clustering internalparameters can begin to vary from the baseline values that indicate thatthe total number of clusters must be set to one.

As one example, one criterion may include that sufficient environmentresponses have been collected, e.g., once the amount of environmentresponses that have been collected exceeds a threshold value. As aparticular example, the system may determine that the threshold value issatisfied when the total number of environment responses exceeds thethreshold value. As another particular example, the system can determinethat the threshold value is satisfied when the minimum number ofenvironment responses associated with any one possible setting for anycontrollable element exceeds the threshold value.

As another example, another criterion can specify the system can beginclustering once the system has determined that, for any one of thecontrollable elements, different environment characteristics impact thecausal effects of different control settings for that controllableelement differently. As a particular example, this criterion can specifythat the system can begin clustering when the d-score distributions forany controllable element are statistically different between any twoprocedural instances, i.e., that the d-score distributions in a causalmodel that is based only on the environment responses for one proceduralinstance is statistically different, i.e., to a threshold level ofstatistical significance, from the d-score distributions in a causalmodel that is based only on the environment responses for anotherprocedural instance.

Selecting control settings, updating the causal model, and updating theinternal parameters while in the clustering phase are described in moredetail below with reference to FIG. 3.

FIG. 3 is a flow diagram of an example process 300 for performing aniteration of environment control. For convenience, the process 300 willbe described as being performed by a system of one or more computerslocated in one or more locations. For example, a control system, e.g.,the control system 100 of FIG. 1, appropriately programmed, can performthe process 300.

The system can repeatedly perform the process 300 to update a causalmodel that measures causal relationships between control settings andenvironment responses.

The system determines a set of current procedural instances based on thecurrent internal parameters (step 302). As will be described in moredetail below with reference to FIG. 4A, the system determines, based onthe current internal parameters, a spatial extent and a temporal extent,e.g., based on how likely different spatial and temporal extents are toresult in instances that are orthogonal, and then generates the currentprocedural instances based on the spatial and temporal extent.

As described above, each procedural instance is a collection of one ormore entities within the environment and is associated with a timewindow. The time window associated with a given procedural instancedefines, as described in more detail below, which environment responsesthe system attributes to or associates with the procedural instance.

In some cases, for each controllable element, the system also determineshow long the setting which is selected for the controllable element willbe applied as a proportion of the time window associated with thecontrollable element, e.g., the entire time window, the first quarter ofthe time window, or the first half of the time window. Generally, theduration for which settings are applied can be fixed to a value that isindependent of the time window, can be a fixed proportion of the timewindow, or the proportion of the time window can be an internalparameter that is varied by the system.

Determining the current set of procedural instances is described in moredetail below with reference to FIG. 4A.

In cases where the environment includes only a single physical entity,the current set of instances may contain only one instance.Alternatively, the system can identify multiple current instances, witheach current instance including the single physical entity but beingseparated in time, i.e., by at least the temporal extent for the entity.

The system assigns control settings for each current instance (step304). The manner in which the system assigns the control settings forany given instance is dependent on which control phase the system iscurrently performing.

As described above, at the outset of controlling the environment, i.e.,before enough information is available to determine causal relationshipswith any confidence, the system operates in an initialization phase. Inthe initialization phase, the system selects control settings forinstances without considering the current causal model, i.e., the systemexplores the space of possible control settings. That is, the systemselects control settings for the instances in accordance with thebaseline probability distributions over the possible control settingsfor each controllable element.

In some implementations, during the initialization phase the systemvaries the internal parameters that determine the spatial extents, thetemporal extents, or both of the procedural instances in order toidentify how likely each possible value for the spatial and temporalextents is to result in instances that are orthogonal to one another.

As described above, in some implementations, the set of control phasesincludes only the initialization phase and the system continues tooperate in this initiation phase throughout, i.e., continues to explorethe space of possible control settings while compiling environmentresponses in order to update the causal model.

In some other implementations, the system shifts into an exploit phaseonce certain criteria are satisfied. In the exploit phase, the systemselects control settings for some of the current instances based on thecurrent causal model, i.e., to exploit the causal relationshipscurrently being reflected in the causal model, while continuing toselect control settings for others of the current instances based on thebaseline values of the internal parameters.

Additionally, in some implementations, either during the initializationphase or the exploit phase, the system begins performing clustering.

When clustering is being performed, the system clusters the proceduralinstances into clusters. Within each cluster, the system proceedsindependently as described above.

That is, during the initialization phase, the system uses the baselinedistributions to select settings independently within each clusterwhile, during the exploit phase, the system assigns control settings forsome of the current instances based on the current causal modelindependently within each cluster while continuing to select controlsettings for others of the current instances based on the baselinevalues of the internal parameters independently within each cluster.

By performing clustering, the system is able to conditionally assigncontrol settings based on (i) the factorial interactions between theimpact of the settings on environment responses and the environmentcharacteristics of the instances, e.g., the attributes of the instancesthat cannot be manipulated by the control system, (ii) the factorialinteractions of different independent variables, or (iii) both.

Selecting control settings in the exploit phase with and withoutclustering is described in more detail below with reference to FIG. 5.

The system obtains environment responses for each of the proceduralinstances (step 306).

In particular, the system monitors environment responses and determineswhich environment responses to attribute to which current instance basedon the time window associated with each procedural instance.

More specifically, for each procedural instance, the system associateswith the procedural instance each environment response that (i)corresponds to the entities in the procedural instance and (ii) isreceived during some portion of the time window associated with theprocedural instance. As a particular example, to limit carryover effectsfrom previous control setting assignments, the system can associate withthe procedural instance each environment response that corresponds tothe entities in the instance and is received more than a thresholdduration of time after the start of the time window, e.g., during thesecond half of the time window, the last third of the time window, orthe last quarter of the time window. In some implementations, thisthreshold duration of time is fixed. In other implementations, thesystem maintains a set of internal parameters that define this thresholdduration and varies the duration during operation of the system.

The system updates the causal model based on the obtained environmentresponses (step 308). Updating the causal model is described in moredetail below with reference to FIG. 6.

The system updates at least some of the internal parameters (step 310)based on the current performance of the system, i.e., as reflected inthe updated causal model, relative to the baseline performance of thesystem, or both.

In particular, the system can update any of a variety of the sets ofinternal parameters based on a heuristic-based approach, by stochasticvariation, or both. The heuristic-based approach can include heuristicsthat are derived from one or more of: the updated causal model, thecurrent performance of the system relative to the baseline performanceof the system, or on criteria determined using a priori statisticalanalyses.

In other words, for each set of internal parameters that the system isable to vary, the system can use one or more of the above techniques toupdate the set of internal parameters to allow the system to moreaccurately measure causal relationships.

In some cases, the system constrains certain sets of internal parametersto be fixed even if the system able to vary the internal parameters. Forexample, the system can fix the data inclusion window parameters and theclustering parameters during the initialization phase. As anotherexample, the system can fix the clustering parameters until certaincriteria are satisfied, and then begin varying all of the internalparameters that are under the control of the system during the exploitphase after the criteria have been satisfied.

Updating a set of internal parameters is described in more detail belowwith reference to FIGS. 8-12.

Generally, the system can perform steps 302-306 with a differentfrequency than step 308 and perform step 310 with a different frequencythan both steps 302-306 and step 310. For example, the system canperform multiple iterations of steps 302-306 for each iteration of step308 that is performed, i.e., to collect environment responses tomultiple different sets of instances before updating the causal model.Similarly, the system can perform multiple different instances of step308 before performing step 310, i.e., can perform multiple differentcausal model updates before updating the internal parameters.

FIG. 4A is a flow diagram of an example process 400 for determiningprocedural instances. For convenience, the process 400 will be describedas being performed by a system of one or more computers located in oneor more locations. For example, a control system, e.g., the controlsystem 100 of FIG. 1, appropriately programmed, can perform the process400.

The system selects a spatial extent for each of the entities in theenvironment (step 402). The spatial extent for a given entity definesthe segment of the environment that, when controlled by a given set ofcontrol settings, impacts the environment responses that are obtainedfrom the given entity. The spatial extent for a given entity is definedby a set of spatial extent parameters, e.g., either a set of spatialextent parameters that is specific to the given entity or a set that isshared among all entities In some implementations, the spatial extentinternal parameters are fixed, i.e., are held constant to the same valueor are sampled randomly from a fixed range throughout the controlling ofthe environment. For example, if the environment includes only a singleentity, each procedural instance will include the same, single entity.As another example, if the environment includes multiple entities, butthere is no uncertainty about which entities are affected by a controlsetting, the spatial extent parameters can be fixed to a value thatensures that the instances generated will be orthogonal.

When the spatial extent is not fixed and a single value is maintainedfor the spatial extent parameters (i.e., the spatial extent parametersare updated based on heuristics only), the system selects the currentvalue for the spatial extent parameter for each entity as the spatialextent for the entity. When the spatial extent is not fixed and a rangeof values is defined by the spatial extent parameters, the systemsamples a value for the spatial extent from the range currently definedby the spatial extent parameters based on the current causal model forthe spatial extent parameters for the entity.

By selecting the spatial extents for the entities, the system defineshow many entities are in each procedural instance and which entities areincluded in each procedural instance. In particular, the systemgenerates the procedural instances such that no procedural instancecovers a segment of the environment that is even partially within thespatial extent of an entity in another procedural instance.

FIG. 4B shows an example of a map 420 of an environment that includesmultiple physical entities that are each associated with a spatialextent. In particular, FIG. 4B shows an environment that includesmultiple physical entities, represented as dots in the Figure, within aportion of the United States. The spatial extent selected by the systemfor each entity is represented by a shaded circle. For example, thesystem may maintain a range of possible radii for each entity and canselect the radius of the shaded circle for each entity from the range.As can be seen from the example of FIG. 4B, different entities can havedifferent spatial extents. For example, entity 412 has a different sizedshaded circle than entity 414.

As can also be seen from the example of FIG. 4B, the system can alsooptionally apply additional criteria to reduce the likelihood that theprocedural instances are not orthogonal. In particular, the system hasalso selected for each entity a buffer that extends beyond the spatialextent for the entity (represented as a dashed circle) and has requiredthat no entity in a different instance can have a spatial extent that iswithin that buffer.

Because of the spatial extents and because of the buffer, certainentities within the environment are not selected as part of a proceduralinstance in the iteration that is depicted in FIG. 4B. These unselectedentities, e.g., entity 416, are represented as dots without shaded ordashed circles. In particular, the system has not selected theseentities because their spatial extents intersected with the spatialextent or buffer of another entity that was selected as part of aprocedural instance. For entity 416, the entity was not selected becausethe spatial extent of entity 416 would have intersected with the spatialextent or buffer of entity 414. For example, given the sampled spatialextents and buffers, the system could have selected the spatial extentsthat would maximize the number of procedural instances that could beincluded in the current set of instances without violating any of thecriteria.

The system selects a temporal extent for each procedural instance or, ifdifferent controllable elements have different temporal extents, foreach controllable element of each procedural instance (step 404). Asdescribed above, the temporal extent defines the time window that isassociated with each of the procedural instances or the time windowsthat are associated with the controllable elements within the proceduralinstances.

In some cases, the temporal extent can be fixed, i.e., it is knownbefore the operation of the control system to a user of the system whichenvironment responses that are observed for a given entity in theenvironment should be attributed to the procedural instance thatincludes that entity. In other cases, the temporal extent can be unknownor be associated with some level of uncertainty, i.e., a user of thesystem does not know or does not specify exactly how long after a set ofsettings is applied the effects of that setting can be observed.

In cases where the temporal extent is not fixed, the system samples avalue for the temporal extent from the range currently defined by thetemporal extent parameters based on the current causal model for thetemporal extent parameters. As described above, different entities (andtherefore different procedural instances) can have different sets oftemporal extent parameters or all of the entities can share the same setof temporal extent parameters.

The system generates procedural instances based on the selected spatialextent and the selected temporal extents (step 406). In other words, thesystem divides the entities in the environment based on the spatialextent, i.e., so that no entity that is in a procedural instance has aspatial extent (or buffer, if used) that intersects with the spatialextent of another entity that is in a different procedural instance, andassociates each procedural instance with the time window defined by thespatial extent for the procedural instance.

FIG. 5 is a flow diagram of an example process 500 for selecting controlsettings for the current set of instances. For convenience, the process500 will be described as being performed by a system of one or morecomputers located in one or more locations. For example, a controlsystem, e.g., the control system 100 of FIG. 1, appropriatelyprogrammed, can perform the process 500.

The system determines current procedural instances (step 502), e.g., asdescribed above with reference to FIG. 4A.

The system then performs steps 504-514 for each of the controllableelements to select a setting for the controllable elements for all ofthe current procedural instances.

Optionally, the system clusters the current procedural instances basedon the environment characteristics, i.e., generates multiple clustersfor the controllable element (step 504). Because the clustering isperformed per controllable element, the system can cluster the currentprocedural instances differently for different controllable elements.Clustering the procedural instances is described below with reference toFIG. 7.

That is, when the system is currently performing the clustering phase,the system first determines the current cluster assignments for thecurrent procedural instances. After the system determines the currentcluster assignments, the system performs an iteration of the steps506-514 independently for each cluster.

When the system is not currently performing the clustering phase, thesystem does not cluster the current procedural instances and performs asingle iteration of the steps 506-514 for all of the current proceduralinstances.

The system determines a current hybrid to baseline ratio (step 506). Inparticular, when the set of ratio parameters for the controllableelement include only a single value, the system selects the currentvalue of the ratio parameter as the current hybrid to baseline ratio.When the system of ratio parameters for the controllable element definesa range of possible values, the system samples a value for the hybrid tobaseline ratio from the current range of possible values defined by theratio parameters based on the causal model for the set of ratioparameters.

The system identifies each instance as either a hybrid instance for thecontrollable element or a baseline instance for the controllable elementbased on the current hybrid to baseline ratio (step 508). For example,the system can assign each instance to be a hybrid instance with aprobability that is based on the ratio or can randomly divide up thetotal number of instances to as closely equal the ratio as possible.Alternatively, when the system is stochastically varying at least one ofthe internal parameters based on a difference between hybrid instanceperformance and baseline instance performance, the system may apply anassignment scheme that assigns the instances based on the current ratioand that accounts for the blocking scheme used when computing the causalmodel that measures the difference between performance, i.e., asdescribed above.

The system selects control settings for the controllable element for thebaseline instances based on the baseline values of the internalparameters and in accordance with the assignment scheme (step 512). Inother words, the system selects control settings for the baselineinstances based on the baseline probability distribution over thepossible values for the controllable element determined at the outset ofthe initialization phase.

The system selects control settings for the hybrid instances based onthe current causal model and in accordance with the assignment scheme(step 514).

In particular, the system maps the current causal model to a probabilitydistribution over the possible settings for the controllable element.For example, the system can apply probability matching to map the impactmeasurements and confidence intervals for the controllable element inthe causal model to probabilities.

Generally, the system assigns the control settings based on theseprobabilities and so that a sufficient number of blocked groups willlater be identified by the system when computing d-scores. As aparticular example, the system can then divide the hybrid instances intoblocked groups (based on the same blocking scheme that will later beused to compute d-scores) and then select control settings within eachblocked group in accordance with the probability distribution over thepossible settings, i.e., so that each instance in the blocked group isassigned any given possible setting with the probability specified inthe probability distribution.

FIG. 6 is a flow diagram of an example process 600 for updating thecausal model for a given controllable element and a given type ofenvironment response. For convenience, the process 600 will be describedas being performed by a system of one or more computers located in oneor more locations. For example, a control system, e.g., the controlsystem 100 of FIG. 1, appropriately programmed, can perform the process600.

The system can perform the process 600 for each controllable element andfor each type of environment response for which the system ismaintaining a causal model. For example, when the system is maintaininga causal model that models causal effects for only a single performancemetric, the system only performs the process 600 for the performancemetric. Alternatively, when the system is maintaining a causal modelthat models causal effects for multiple different types of environmentresponses, the system performs the process 600 for each type ofenvironment response, e.g., each different type of sensor reading ormeasurement.

When the system is currently clustering procedural instances intoclusters, the system can perform the process 600 independently for eachcluster. That is, the system can independently maintain and update acausal model for each cluster.

The system determines a current data inclusion window for thecontrollable element (step 602), i.e., based on the current datainclusion window parameters for the controllable element. In particular,when the set of data inclusion window parameters for the controllableelement include only a single value, the system selects the currentvalue of the data inclusion window parameter as the current datainclusion window. When the set of data inclusion window parameters forthe controllable element defines a range of possible values, the systemsamples a value for the data inclusion window from the range of valuescurrently defined by the set of data inclusion window parameters. Whenthe data inclusion window parameters are not varied by the system, thesystem sets the value to the fixed, initial data inclusion window orsamples a value from the fixed range of possible values.

The system obtains, for each possible value of the controllable element,the environment responses (step 604) of the given type that have beenrecorded for instances for which the possible value of the controllableelement was selected. In particular, the system obtains only theenvironment responses for instances that occurred during the currentdata inclusion window.

The system updates the impact measurements in the causal model based onthe environment responses for the possible settings of the controllableelement (step 606).

That is, the system determines a set of blocked groups based on ablocking scheme, e.g., one of the blocking schemes described above.

For each blocked group, the system then determines a respective d-scorefor each possible setting that was selected in any of the instances inthe blocked group. Generally, the system computes the impactmeasurements, i.e., the d-scores, for a given controllable element basedon the blocking scheme, i.e., computes d-scores between environmentresponses for instances that were assigned to the same blocked group.

As a particular example, the impact measurement d, for a possiblesetting i of the controllable element in a blocking scheme that assignsblocked groups to include at least one instance having each possiblesetting may satisfy:

${d_{i} = {x_{i} - \frac{\Sigma_{j \neq i}x_{j}}{N - 1}}},$

where x_(i) is the environment response of the given type for theinstances within the blocked group where the setting i has beenselected, the sum is over all of the possible settings except i, and Nis the total number of possible settings.

As another particular example, the impact measurement d_(i) for apossible setting i of the controllable element in a blocking scheme thatassigns pairs of instances to blocked groups may satisfy:

d _(i) =x _(i) −x _(i+1).

where x_(i) is the environment response of the given type for theinstance within the blocked group where the setting i was selected andx_(i+1) is the environment response of the given type for the instancewithin the blocked group where the setting i+l was selected, where thesetting i+l is the immediately higher possible setting for thecontrollable element. For the highest setting for the controllableelement, the setting i+1 can be the lowest setting for the controllableelement.

As yet another particular example, the impact measurement d_(i) for apossible setting i of the controllable element in a blocking scheme thatassigns pairs of instances to blocked groups may satisfy:

d _(i) =x _(i) −x ₁,

where x₁ the environment response of the given type for the instancethat has a predetermined one of the possible settings for thecontrollable element selected.

The system then computes the updated overall impact measurement for agiven setting i as the mean of the d-scores computed for the setting i.

In some cases, the d-score calculation can be proportional rather thanadditive, i.e., the subtraction operation in any of the abovedefinitions can be replaced by a division operation.

The system determines, for each of the possible values of thecontrollable element, a confidence interval for the updated impactmeasurement (step 608). For example, the system can perform a t-test orother statistical hypothesis test to construct a p % confidence intervalaround the updated impact measurement, i.e., around the mean of thed-scores, where p is a fixed value, e.g., 95% or 97.5% or 99%. In someimplementations, the system applies different p values for differentcontrollable elements, e.g., when the external data specifies thatdifferent controllable elements have different costs or levels of riskassociated with deviating from the baseline probability distribution forthe different controllable elements.

In some implementations, the system applies a correction, e.g., aBonferroni correction, to the confidence intervals in the case wherecertain settings for the controllable element are associated withdifferent costs of implementation or a higher risk. In particular, in aBonferroni correction, a correction is applied such that if N confidenceintervals are computed for N possible settings of a controllable elementand the overall desired confidence level of that element is 95% (i.e.alpha=0.05), then the alpha value used for each individual test tocompute a confidence interval is alpha/N. If certain settings areassociated with a higher risk or cost of implementation, then a“corrected” alpha value associated with a higher level of confidence maybe specified for those settings. This forces the system to accumulatemore data before exploiting those settings.

FIG. 7 is a flow diagram of an example process 700 for clustering a setof procedural instances for a given controllable element. Forconvenience, the process 700 will be described as being performed by asystem of one or more computers located in one or more locations. Forexample, a control system, e.g., the control system 100 of FIG. 1,appropriately programmed, can perform the process 700.

The system selects the current hyperparameters for the clusteringtechnique being used by the system from the clustering parameters forthe controllable element (step 702). In particular, each hyperparameterthat can be varied by the system is defined by a distinct set ofinternal parameters. That is, the clustering parameters include aseparate set of internal parameters for each hyperparameter that isunder the control of the system during operation.

The system can use any of a variety of clustering techniques to performthe clustering. However, the hyperparameters that are varied by thesystem will generally include hyperparameters for the size of theclusters generated by the clustering technique and, in some cases, theenvironment characteristics of the instances that are considered by theclustering technique when generating the clusters.

As one example, the system can use a statistical analysis, e.g., afactorial analysis of variance (ANOVA), to generate clusteringassignments. In particular, factorial ANOVA is used to find the factors,i.e., the environment characteristics, that explain the largest amountof variance between clusters. That is, as D-scores are computed for eachpossible control setting, factorial ANOVA can monitor interaction termsbetween these treatment effects and external factors. As dataaccumulates and interactions start emerging, factorial ANOVA createsdifferent clusters of instances across space and time where each clusteris representative of distinct external factor states or attributes.

As another example, the system can use a machine learning technique togenerate the clustering assignments. As a particular example of amachine learning technique, the system can use decision trees. Decisiontrees are a classical machine learning algorithm used for classificationand regression problems. Decision trees use a recursive partitioningscheme by sequentially identifying the best variable, i.e., the bestenvironment characteristic, to split on using information theoreticfunctions like the Gini coefficient. As another particular example of amachine learning technique, the system can use conditional inferencetrees. Like decision trees, conditional inference trees are a recursivebinary partitioning scheme. The algorithm proceeds by choosing asequence of variables to split on, based on a significance testprocedure to partition based on the strongest environment characteristicfactors. As another particular example, the system can process datacharacterizing each of the procedural instances and their associatedenvironment characteristics using a machine learning model, e.g., a deepneural network, to generate an embedding and then cluster the proceduralinstances into the specified clusters based on similarities between theembeddings, e.g., using k means clustering or another clusteringtechnique. As a particular example, the embeddings can be the output ofan intermediate layer of a neural network that has been trained toreceive data characterizing a procedural instance and to predict thevalue of the performance metric for the procedural instance.

In some cases, the system can switch clustering techniques as theoperation of the system progresses, i.e., as more data becomesavailable. For example, the system can switch from using a statisticaltechnique or a decision tree to using a deep neural network once morethan a threshold amount of procedural instances are available.

The system clusters the instances in the current data inclusion windowusing the clustering technique in accordance with the selectedhyperparameters (step 704).

The system computes a causal model for each cluster (step 706), i.e., asdescribed above with reference to FIG. 6 but using only the instancesthat have been assigned to the cluster.

The system then assigns control settings for the controllable elementindependently within each of the clusters based on the computed causalmodel for the cluster (step 708), i.e., as described above withreference to FIG. 5. In particular, the system clusters each currentinstance using the clustering technique and then assigns the controlsettings for a given current instance based on the cluster that thecurrent instance is assigned to and, if the given current instance isnot designated a baseline instance, using the causal model computed forthe cluster.

The system can then determine whether the clustering parameters need tobe adjusted (step 710), i.e., determines if the current values of theclustering parameters are not optimal and, if so, updates the clusteringparameters for the controllable element. In particular, duringoperation, the system updates the clustering parameters to balance twocompeting goals: (1) pooling instances into clusters such that there ismaximum within-cluster similarity of the impact of controllable elementson the performance metric and maximum between-cluster difference in theimpact of controllable elements on the performance metric and (2)maximizing the size of clusters in order to have the largest possiblewithin-cluster sample size, to increase the precision of the causalmodel. The system can accomplish this by adjusting the values usingheuristics, using stochastic sampling, or both heuristics and stochasticsampling.

The system can determine whether to change the number of clusters, i.e.,to change the value of the clustering parameter for the controllableelement, in any of a variety of ways, i.e., based on any of a variety ofheuristics.

More generally, as described above, for any given set of internalparameters that are varied by the system, the system can adjust the setof internal parameters in one of three ways: (i) using a heuristic-basedapproach to adjust a single value, (ii) using stochastic variation toadjust likelihoods assigned to different values in a range of value, or(iii) using a heuristic-based approach to adjust the range of valueswhile using stochastic variation to adjust likelihoods within thecurrent range.

The heuristic-based approach can include heuristics that are based onproperties of the current causal model, heuristics that are based on apriori statistical analyses, or both.

In the stochastic variation approach, the system maintains a causalmodel that measures causal effects between different values within thecurrent range and a figure of merit for the set of internal parameters.The system then maps the causal model to probabilities for the differentvalues and, when required, selects a value for the internal parameterbased on the probabilities. As will be described in more detail below,the figure of merit for any given set of internal parameters isgenerally different from the performance metric that is being measuredin the causal model that models causal relationships between the controlsettings and the performance metric.

FIG. 8 is a flow diagram of an example process 800 for updating a set ofinternal parameters using stochastic variation. For convenience, theprocess 800 will be described as being performed by a system of one ormore computers located in one or more locations. For example, a controlsystem, e.g., the control system 100 of FIG. 1, appropriatelyprogrammed, can perform the process 800.

The process 800 can be performed for any set of internal parameters thatis being updated using stochastic variation. Examples of such internalparameters can include any or all of sets of data inclusion windowparameters, sets of clustering parameters, sets of ratio parameters,sets of spatial extent parameters, sets of temporal extent parameters,and so on.

As described above, during the clustering phase and for any set ofinternal parameters other than the clustering parameters, the system canperform the process 800 independently for each cluster or for eachcontrollable element and for each cluster.

Additionally, in cases where the clustering parameters are varied usingstochastic variation, the system can also perform the process 800independently for each controllable element.

The system maintains a causal model for the set of internal parametersthat measures the causal relationships between the different possiblevalues for the internal parameter and a figure of merit for the set ofinternal parameters (step 802).

For example, the figure of merit for the set of internal parameters canbe the difference between the performance of the hybrid instances andthe performance of the baseline instances. In this example, the figureof merit measures the relative performance of the hybrid instances tothe baseline instances and the system computes impact measurements,i.e., d-scores, on this figure of merit for different values in therange defined by the internal parameters.

Thus, when computing the causal model for the set of internalparameters, the system proceeds as described above with reference toFIG. 6, except that (i) the possible settings are possible values forthe internal parameter and (ii) each x_(i) in the d-score calculation isa difference between (1) a performance metric for a hybrid instance forwhich the control settings were assigned with the possible value for theinternal parameter selected and (2) a performance metric for acorresponding baseline instance.

As another example, the figure of merit for the set of internalparameters can be a measure of the precision of the causal model for thecontrollable element, e.g., a measure of the width of the confidenceintervals for the different settings of the controllable element.

This maintained causal model can be determined based on a data inclusionwindow for the set of internal parameters. When the set of internalparameters actually are the data inclusion window parameters, the datainclusion windows are different for the different possible values in thecurrent range. When the set of internal parameters are a different setof internal parameters, the data inclusion window can be a separate setof internal parameters that is fixed or that is varied based onheuristics as described below or also based on stochastic variation asdescribed in this figure.

The system maps the causal model to a probability distribution overpossible values in the range of values, e.g., using probability matching(step 804). That is, the system maps the impact measurements and theconfidence intervals to probabilities for each possible value in therange of values using probability matching or another appropriatetechnique.

When a value is required to be sampled from the range, the systemsamples values from the range of possible values in accordance with theprobability distribution (step 806). That is, when a value from therange defined by the internal parameters is needed for the system tooperate, e.g., to assign a temporal extent to a procedural instance, toassign a data inclusion window to a given controllable element, todetermine a hyperparameter for a clustering technique, or to assign acurrent hybrid to baseline ratio for the current set of instances, thesystem samples from the range of possible values in accordance with theprobability distribution. By sampling the values in this manner, thesystem ensures that values that are most likely to optimize the figureof merit for the set of internal parameters, e.g., to maximize the deltabetween hybrid and baseline instances, are sampled more frequently whilestill ensuring that the space of possible values is explored.

The system computes an update to the causal model (step 808). That is,as new environment responses for new procedural instances are received,the system re-computes the causal model by computing overall impactmeasurements, i.e., means of d-scores, and confidence intervals aroundthe overall impact measurements. The system can perform this computationin the same manner as the causal model updates described above withreference to FIG. 6, i.e., by selecting blocked groups, computingd-scores within those blocked groups (based on the figure of merit forthe set of parameters described above), and then generating the causalmodel from those d-scores. By computing the updated causal model, thesystem effectively adjusts how frequently each of the possible values inthe range is sampled, i.e., because the mapping to probabilities is alsoadjusted.

By repeatedly performing the process 800, the system can repeatedlyadjust the probabilities assigned to the values in the range to favorvalues that result in a more optimal figure of merit.

For example, when the set of internal parameters is the data inclusionwindow parameters, maintaining a causal model that models the effectthat different data inclusion window values have on hybrid versusbaseline performance allows the system to select data inclusion windowsthat results in more accurate and robust causal models being computedfor the controllable element.

As another example, when the set of internal parameters are the spatialor temporal extent parameters, maintaining a causal model that modelsthe effect that different spatial or temporal extent values have onhybrid versus baseline performance allows the system to select spatialor temporal extents that result in orthogonal procedural instances thatmaximize the hybrid instance performance relative to baseline instanceperformance.

As another example, when the set of internal parameters define aclustering hyperparameter, maintaining a causal model that models theeffect that different hyperparameter values have on hybrid versusbaseline performance allows the system to select clustering assignmentsthat maximize the performance of the system, i.e., more effectivelyidentify clustering assignments that satisfy the goals described abovewith reference to FIG. 7.

In some implementations, the system determines whether to adjust thecurrent range of possible values for the internal parameter (step 810).As described above, the range of possible values for any given internalparameter can be fixed or can be adjusted using heuristics to ensurethat the space of possible values that is being explored remainsrational throughout the operation of the system.

One example of a heuristic that can be used to adjust the current rangeof possible values is a heuristic that relies on the shape of thecurrent causal model. In particular, the system can increase the upperbound of the range (or increase both the upper and lower bound of therange) when the impact measurements in the causal model are growing inmagnitude as the current upper bound of the range is approached anddecrease the lower bound (or decrease both the upper and lower bound)when the impact measurements are growing in magnitude as the currentlower bound of the range is approached.

Another example of a heuristic that can be used to adjust the current ofpossible value is a heuristic that relies on a statistical poweranalysis.

For example, when the set of internal parameters are a set of clusteringparameters that define a cluster size used by the clustering technique,the system can compute a statistical power curve that represents theimpact that changes in sample size, i.e., cluster size will have on thewidth of the confidence intervals the current causal model is reflectingfor the controllable element. Given the nature of statistical powercurves, the confidence intervals become more precise quickly at thesmall end of the sample size but, as the sample size increases, eachadditional increase in sample size results in a disproportionatelysmaller increase in the precision of the confidence intervals (i.e.,disproportionally smaller decrease in the width of the confidenceintervals). Thus, exploring larger cluster sizes may lead to very littlegain in statistical power and comes with a high risk of not accuratelyrepresenting the current decision space. To account for this, the systemcan then constrain the range of the possible cluster sizes to a rangethat falls between a lower threshold and an upper threshold on thestatistical power curve. By constraining the cluster sizes in this way,the system does not explore clusters that are so small as to result intoo little statistical power to compute significant confidenceintervals. The system also does not experiment with cluster sizes thatare unnecessarily large, i.e., cluster sizes that result in small gainsin statistical power in exchange for the risk of failing to capture allthe potential variation between instances.

As another example, when the set of internal parameters are a set ofratio parameters, the system can perform a statistical power analysis tocompute the minimum number of baseline instances that are required todetermine, given the current causal model for the ratio parameters, thatthe hybrid instances outperform the baseline instances with a thresholdstatistic power. The system can then adjust the lower bound of the rangeof possible ratio values so that the ratio does not result in a numberof baseline instances that is below this minimum number.

As another example of adjusting a range based on heuristics, when therange of temporal extent parameters for the entities in the environmentare being updated based on heuristics, the system can maintain for eachentity a causal model that measures the causal relationships between (i)the control settings selected at a given control iteration and (ii) theenvironment responses obtained from the entity at the subsequent controliteration, i.e., at the control iteration immediately after the givencontrol iteration. Because the system is attempting to select temporalextents for entities that ensure that procedural instances areorthogonal, if the temporal extent has been properly selected, thiscausal model should indicate that the causal effects are likely zerobetween current control settings and environment responses to subsequentcontrol settings. Thus, the system can determine to increase the lowerbound on the range of possible temporal extents if the causal modelshows that the confidence intervals for the impact measurements for anyof the control settings have more than a threshold overlap with zero.

As another example of adjusting a range based on heuristics, when therange of spatial extent parameters for the entities in the environmentare being updated based on heuristics, the system can maintain for eachgiven entity a causal model that measures the causal relationshipsbetween (i) the control settings selected at a given control iterationfor the procedural instance that includes the given entity and (ii) theenvironment responses obtained from an adjacent entity to the givenentity at the current control iteration. The adjacent entity can be theentity that is closest to the given entity from the entities that areincluded in the current set of instances for the current controliteration. Because the system is attempting to select spatial extentsfor entities that ensure that procedural instances are orthogonal, ifthe spatial extent has been properly selected, this causal model shouldindicate that the causal effects are likely zero between current controlsettings for the given entity and environment responses for the adjacententity. Thus, the system can determine to increase the lower bound onthe range of possible spatial extents if the causal model shows that theconfidence intervals for the impact measurements for any of the controlsettings have more than a threshold overlap with zero.

Additional examples of heuristics that can be used to adjust the rangeof possible values for the data inclusion window and the ratioparameters are described in more detail below with reference to FIG. 12.

FIG. 9 is a flow diagram of an example process 900 for updating thevalue of a data inclusion value for a given controllable element basedon heuristics. For convenience, the process 900 will be described asbeing performed by a system of one or more computers located in one ormore locations. For example, a control system, e.g., the control system100 of FIG. 1, appropriately programmed, can perform the process 900.

Generally, the system performs the process 900 for the data inclusionwindow when the data inclusion window is a parameter that is beingvaried based on heuristics and not using stochastic variation.

When the system maintains multiple clusters for the given controllableelement, the system can perform the process 900 independently for eachcluster, i.e., so that the data inclusion window for the givencontrollable element within one cluster can be updated differently fromthe set of internal parameters for the given controllable element withinanother cluster.

The system accesses the current causal model for the given controllableelement (step 902).

The system analyzes one or more properties of the current causal model(step 904). For example, the system can perform a normality test todetermine whether the d-scores for the various possible control settingsfor the given controllable element are normally distributed (step 904).In particular, the system can conduct a normality test, e.g., aShapiro-Wilk test, on the d-score distributions for the givencontrollable element in the current causal model. Generally, the systemscales and pools together the d-score distributions between thedifferent possible settings to generate a single distribution and thenperforms the normality test on the single distribution. The system canperform this test for different data inclusion windows, e.g., for thecurrent causal model computed using the current data inclusion windowand one or more alternative causal models computed using one or morealternative data inclusion windows, to find the longest data inclusionwindow that satisfies the normality test with some prescribed p-value.

As another particular example, the system can measure the overlap in theconfidence intervals between different impact measurements in the givencontrollable element in the current causal model. The system can performthis test for different data inclusion windows, e.g., for the currentcausal model computed using the current data inclusion window and one ormore alternative causal models computed using one or more alternativedata inclusion windows, to find the data inclusion window that comesclosest to a desired degree of overlap.

As another particular example, the system can compute a statisticalpower analysis to identify the sample size that will result in thecurrent causal model having a desired statistical power. The system canthen adjust the data inclusion window so the number of instancesincluded in the adjusted window equals the identified sample size.

The system determines whether to adjust the data inclusion windowparameter based on the results of the analysis (step 906). For example,the system can adjust the data inclusion window parameter to specify thelongest data inclusion window that satisfies the normality test asdescribed above, or to the data inclusion window that comes closest tothe desired degree of overlap, or to the data inclusion window thatincludes the number of instances that equals the identified sample size.

The example of FIG. 9 is an example of adjusting the data inclusionwindow based on a heuristic. However, in general, any of the internalparameters can be adjusted based on a heuristic (instead of held fixedor adjusted using stochastic variation). A few examples of settinginternal parameters based on heuristics follow.

As one example, the system can set the value of the ratio parameterusing a statistical power analysis. In particular, the system canperform a statistical power analysis to compute the minimum number ofbaseline instances that are required to determine that the hybridinstances outperform the baseline instances with a threshold statisticpower. The system can then adjust the value of the ratio parameter to beequal to this minimum number.

As another example, to set the value of the cluster size hyperparameter,the system can perform an a priori statistical power analysis todetermine a sufficient amount of environment responses that are requiredin order for the causal model to have a desired statistical power, i.e.,instead of a range as described above, and set the value for the clustersize to this range.

The above description describes how the system can modify internalparameters during operation of the system. This adjustment of internalparameters can allow the system to effectively account for changes inthe properties of the environment, i.e., for environments where themapping from control settings to environment responses is not static andcan change at various times during the operation of the system. Unlessproperly accounted for, changes in the properties of the environmentthat do not equally impact all possible control settings for allcontrollable elements can result in inaccurate causal models that arebased on stale data that is no longer relevant and therefore candecrease the effectiveness of the system in controlling the environment.

FIG. 10 is a flow diagram of an example process 1000 for responding to achange in one or more properties of the environment. For convenience,the process 1000 will be described as being performed by a system of oneor more computers located in one or more locations. For example, acontrol system, e.g., the control system 100 of FIG. 1, appropriatelyprogrammed, can perform the process 1000.

The system monitors environment responses to control settings selectedby the system (step 1002). That is, as described above, the systemrepeatedly selects control settings and monitors responses to thoseselected control settings.

The system determines an indication that one or more properties of theenvironment have changed (step 1004). In particular, the change in theproperties of the environment is one that modifies the relative impactthat different settings for at least one of the controllable elementshave on the environment responses that are being monitored by thesystem. That is, by determining an indication that one or moreproperties have changed, the system determines that it is likely thatthe relative causal effects of different settings on the environmentresponses have changed, i.e., as opposed to a global change that affectsall of the possible control settings differently. While the system doesnot have access to direct information specifying that a change hasoccurred, the system can determine based on the monitored environmentresponses an indication that the change has likely occurred.

For example, the system can determine an indication that a change hasoccurred when the difference between the current system performance andthe baseline system performance is decreasing. In particular, asdescribed in more detail below, the system can determine this based onthe performance metric increasing for smaller possible values of thedata inclusion window, i.e., as reflected by the causal model for thedata inclusion window described above.

As another example, the system can determine an indication that a changehas occurred when, as described above, a normality test determines thatthe d-scores for the possible settings of the controllable element areno longer normally distributed.

In response to determining the indication that one or more properties ofthe environment have changed, the system adjusts the internal parametersof the system (step 1006).

Generally, the system adjusts the values of the internal parameters toindicate that there is an increased level of uncertainty about whetherthe causal model maintained by the system accurately captures the causalrelationships between control settings and environment responses.

For example, the system can adjust the data inclusion window parametersto shrink the data inclusion window, i.e., so that only more recenthistorical environment responses will be included when determining thecausal model. That is, the system can adjust the data inclusion windowparameters so that the range of possible data inclusion windows favorsshorter data inclusion windows.

As another example, the system can adjust the ratio parameters todecrease the hybrid-to-explore ratio, i.e., so that there are fewerhybrid instances relative to explore instances. By decreasing the ratio,the system places less reliance on the current causal model whenselecting control settings and instead more frequently explores thespace of possible control settings. That is, the system can adjust theratio parameters so that the range of possible ratios favors smallerratios.

As another example, the system can adjust the clustering parameters todecrease the number of clusters that the instances are clustered into.By decreasing the number of clusters, the system prevents the causalmodel from clustering on characteristics that may no longer be relevantwhen explaining differences in system performance between clusters.

FIG. 11 shows a representation 1100 of the data inclusion window for agiven controllable element of the environment when the set of internalparameters that define the data inclusion are stochastically varied. Ascan be seen in the example of FIG. 11, while the data inclusion windowcould range from zero (i.e., no data is included) to infinity (i.e., allprocedural instances are included), the current stochastic variationrange 110 from which data inclusion windows for the given controllableelement are sampled is between a lower bound A 1102 and an upper bound B1104. In some cases, the lower bound A 1102 and the upper bound B 1104are fixed and the system adjusts the probabilities that are assigned todifferent values between the lower bound A 1102 and the upper bound B1104 by updating the causal model as described above. In other cases,the system can vary the lower bound A 1102 and the upper bound B 1104while also updating the causal model. In particular, the system canadjust the range 1110 based on the likelihood that relative causaleffects of different possible values of the controllable element arechanging.

In particular, as shown in FIG. 11, the system maintains a range ofpossible values for the data inclusion window. That is, the datainclusion window parameters include the lower bound of the range, theupper bound of the range, and the possible values that the datainclusion window can take within the range. The data inclusion windowparameters also include probabilities for the possible values that areused when stochastically sampling values. As described above withreference to FIG. 8, these probabilities are adjusted by the system

In some cases, the range of possible values is fixed. In other cases,however, the system varies the lower and upper bounds of the range basedon one or more heuristics to adjust the possible data inclusion windowsthat are explored by the system and to prevent the system from exploringdata inclusion windows that too short or too long.

For example, the system can compute a statistical power curve thatrepresents the impact that changes in sample size (via changes in datainclusion window) will have on the width of the confidence intervals thecurrent causal model is using for the controllable element. Given thenature of statistical power curves, the confidence intervals become moreprecise quickly at the small end of the sample size but, as the samplesize increases, each additional increase in sample size results in adisproportionately smaller increase in the precision of the confidenceintervals (i.e., disproportionally smaller decrease in the width of theconfidence intervals). Thus, exploring longer data inclusion windows maylead to very little gain in statistical power and comes with a high riskof not accurately representing the current decision space. To accountfor this, the system can then constrain the range of the data inclusionwindow to result in a number of samples that falls between a lowerthreshold and an upper threshold on the statistical power curve. Byconstraining the data inclusion window in this way, the system does notexplore data inclusion windows that so short as to result in too littlestatistical power to compute significant confidence intervals, i.e.,does not explore data inclusion windows that result in insufficient datato compute statistically significant confidence intervals. The systemalso does not explore data inclusion windows that are unnecessarilylong, i.e., that data inclusion windows that result in small gains instatistical power in exchange for the risk of failing to account forrecent changes in the properties of the environment.

As another example, the system can compute a stability measure, e.g., afactorial analysis, of the interaction between time and the relativeimpact measurements of the possible control settings for thecontrollable element. That is, the system can determine the stability ofthe causal relationships over time. The system can increase either theupper bound or both the upper bound and the lower bound of the datainclusion window range when the stability measure indicates that causalrelationships are stable while decreasing the upper bound or both theupper bound and lower bound when the stability measure indicates thatthe causal relationships are unstable, i.e., dynamically changing. Thisallows the system to explore smaller data inclusion windows anddisregard older data when there is higher probability that theproperties of the environment are changing while exploring larger datainclusion windows when there is a higher probability that the propertiesof the environment are stable.

As yet another example, the system can adjust the range based on theshape of the causal model as described above. In particular, the systemcan explore a range of longer data inclusion windows when the impactmeasurements are growing in magnitude as data inclusion window getshigher and a range of smaller data inclusion windows when the impactmeasurements are growing in magnitude as data inclusion window getsshorter. In other words, the system can move the range down when thedifference decreases while moving the range up when the differenceincreases. This allows the system to explore smaller data inclusionwindows and disregard older data when there is higher probability thatthe properties of the environment are changing.

In some cases, the system can apply some combination of theseheuristics, e.g., by allowing the upper bound to increase based oneither or both of the latter two examples so long as the upper bounddoes not exceed the size that corresponds to the upper threshold on thestatistical power curve and by allowing the lower bound to decreasebased on either or both of the latter two examples so long as the lowerbound does not fall below the size that corresponds to the lowerthreshold on the statistical power curve.

While these examples are described with respect to the data inclusionwindow, similar heuristics can also be used to adjust the ratio ofhybrid to baseline instances, i.e., to increase the number of baselineinstances when the probability that the properties of the environmentare changing or have recently changed are higher and to decrease thenumber of baseline instances when the probability that the properties ofthe environment is stable is higher.

FIG. 12 shows the performance of the described system (denoted as “DCL”in FIGS. 12-18) when controlling an environment relative to theperformance of systems that control the same environment using existingcontrol schemes. In particular, FIG. 12 shows the performance of thedescribed system compared to three different kinds of existing controlschemes: (i) a “none” scheme in which a system does not select anysettings and receives only the baseline environment responses (ii) a“random” scheme in which a system assigns control settings randomlywithout replacement, and (iii) various state-of-the-art reinforcementlearning algorithms.

In the example of FIG. 12, the environment that is being controlled has3 controllable elements each with 5 possible control settings and thevalue of the performance metric at each iteration is drawn from aGaussian distribution which is fixed throughout. Application ofparticular control settings changes the parameters of the Gaussiandistribution from which the value of the performance metric is drawn.These characteristics are similar to those found in simple orhighly-controlled real-world environments, for example certainmanufacturing lines, but lack the additional complexities that can beencountered in more complex real-world environments.

The upper set of plots in FIG. 12 shows the performance of each systemin terms of mean cumulative FOM (“MeanCumFOM”). The mean cumulative FOMat any given iteration is the average value of the performance metrics,i.e., FOM, received starting from the first iteration and through thegiven iteration, i.e., the cumulative average performance metric valueacross time.

The lower set of plots in FIG. 12 shows performance of each system bymean FOM per instance (“MeanFOM”) The mean FOM per instance at any giveniteration is the mean of the performance metrics received for theinstance at the given iteration, i.e., without considering earlieriterations.

Generally, the first column (“DCL”) shows the results for the describedsystem while the remaining columns show results for the existing controlschemes.

As indicated above, the environment for which the results are shown inFIG. 12 is less complex than many real-world environments, e.g., becausethe causal effects are fixed, there are no external uncontrollablecharacteristics that impact the performance measures and there is nouncertainty about the spatial or temporal extent. However, even in thisrelatively simple environment the performance of the described systemmeets or exceeds the performance of state-of-the-art systems, with orwithout the advanced features enabled.

A description of the state-of-the-art systems that are used to benchmarkthe performance of the system follows:

-   -   BGE—Boltzmann-Gumbel Exploration [Cesa-Bianchi et al. Boltzmann        Exploration Done Right, Conference on Neural Information        Processing Systems (NeurIPS), 2017] is a multi-armed bandit        algorithm which uses an exponential weighting approach for        control setting assignment selection. It maintains a        distribution over FOMs for each control setting assignment. At        each step, a sample is generated from each of these        distributions and the control setting assignment corresponding        to the largest sample is selected by the algorithm. The internal        parameters of the distributions are then updated using the        received feedback.    -   Ep Greedy—Epsilon Greedy is a general-purpose multi-armed bandit        algorithm that selects a random control setting assignment with        probability epsilon and selects the control setting assignment        which has given highest average FOM in the past with probability        1-epsilon. In effect, it explores epsilon percent of the time        and exploits 1-epsilon percent of the time.    -   UCB—The Upper Confidence Bound (UCB) [Auer et al. Finite-time        Analysis of the Multiarmed Bandit Problem, Machine Learning,        2002] multi-armed bandit algorithm is one of two fundamental        approaches to solving multi-armed bandit problems. It works by        computing the average FOM and confidence interval from        historical data. It selects a control setting assignment by        computing the control setting assignment with highest average        FOM plus confidence interval. In this way it acts optimistically        about the control setting assignment's potential FOM and learns        over time which control setting assignment has the highest FOM.    -   Lin UCB—LinUCB [Li et al. A Contextual-Bandit Approach to        Personalized News Article Recommendation, International World        Wide Web Conference (WWW), 2010] builds on UCB by maintaining an        average FOM and confidence interval and makes a key assumption        that the expected FOM is a linear function of the        characteristics of the procedural instances and the control        setting assignments in the experiment. The algorithm is then        able to select the control setting assignment that is best for        any individual procedural instance. Lin UCB is expected to        perform best in situations where the ideal control setting        assignment is different for different procedural instance        groups.    -   Monitored UCB—Monitored UCB [Cao et al. Nearly Optimal Adaptive        Procedure with Change Detection for Piecewise-Stationary Bandit,        International Conference on Artificial Intelligence and        Statistics (AISTATS), 2019] builds on UCB by computing the        average FOM and confidence interval but is designed for        environments where abrupt changes in the FOM can occur. As such,        it incorporates a change point detection algorithm which        identifies when the FOM changes and resets the internal        parameters (effectively resetting the average FOM and confidence        interval) to start learning the new FOM. Monitored UCB is        expected to perform well (better than UCB and variants) in        environments where an abrupt change in the FOM occurs.    -   ODAAF—Optimism for Delayed Aggregated Anonymous Feedback        [Pike-Burke et al. Bandits with Delayed, Aggregated Anonymous        Feedback, International Conference on Machine Learning (ICML),        2018] is a multi-armed bandit algorithm designed to work in a        setting where feedbacks suffer from random bounded delays.        Feedbacks are additively aggregated and anonymized before being        sent to the algorithm, which makes this setting significantly        more challenging. The algorithm proceeds in phases, maintaining        a set of candidates for the possible optimal control setting        assignments. In each phase, it plays an iterated round robin        strategy amongst these candidates and updates their performance        metric value estimates as it receives feedbacks. At the end of        each phase, the algorithm eliminates the candidates whose        estimated performance metric values are significantly        suboptimal.    -   Thompson Sampling—Thompson Sampling [Agrawal and Goyal. Analysis        of Thompson Sampling for the Multi-armed Bandit Problem,        Conference on Learning Theory (COLT), 2012] is a probability        matching algorithm and is the other fundamental approach to        solving multi-armed bandit problems (the other being        optimism-based approaches like UCB). It works by maintaining a        distribution over the estimated FOM for each control setting        assignment option, sampling from each distribution, and then        selecting the control setting assignment option with highest        sampled (estimated) FOM. Once the true FOM is observed, the        (posteriori) distributions are updated using a Bayesian        approach. The algorithm selects each control setting assignment        proportional to its probability of being the best control        setting assignment.

FIG. 13 shows the performance of the described system relative to theperformance of multiple other systems when controlling multipledifferent environments.

In particular, each of the other systems uses a respective one of theexisting control schemes described above to control multiple differentenvironments.

The environments that are being controlled each have 3 controllableelements each with 5 possible settings and the values of the performancemetric being optimized at each iteration are drawn from a Gaussiandistribution.

The environments have varied complexity through the addition of variousfactors that cause variability between different procedural instances.

In particular, the base environment shown in the top set of graphschanges the mean and variance of the Gaussian distribution depending onthe procedural instance, i.e., so that different procedural instancescan receive different performance metric values even if the same controlsettings are selected.

Others of the environments also introduce time-based changes in theeffects of applying different possible settings for the controllableelements, underlying sinusoidal behavior in the performance metric, anddifferent setting effects for different instance groups, i.e., thatrepresent interactions between environment characteristics andcontrollable elements.

As can be seen from FIG. 13, many of the existing control schemesgenerally do well in the simple, baseline case and a given controlscheme might do well with one additional complexity factor, but none ofthe existing control schemes perform well in all cases. The describedsystem, on the other hand, performs comparably to or better than thebest existing control scheme in all of the environments. Thus, theexample of FIG. 13 shows that the described system is able to perform atsimilar or better possible settings than the other control schemes foreach of the different environments because of the ability of the systemto automatically adapt to varied, complex environments without requiringmanual model selection, i.e., by continually varying the internalparameters of the system to account for different properties ofdifferent environments even when no prior knowledge of the properties ofthe environment was available.

A detailed explanation of each of the environments being controlledfollows.

-   -   00_base—3 controllable elements with 5 possible settings each        with the performance metric value drawn from a Gaussian        distribution. Selection of different IV Possible settings        changes the mean and/or standard deviation of the distribution.        This environment is relatively simple but does have many        combinations of possible control settings as is often found in        real-world environments.    -   01_add_subject_var—starting from 00_base, the procedural        instances are divided into 3 groups with different base-rate        means and standard deviations for their performance metric value        distributions. This introduces additional variance in the data        without changing the effects of control setting assignments.        procedural instance/EU variance of this type is very typical of        the real world. For example, this specific configuration mimics        sales behavior of an assortment of products where a small group        of products account for a bulk of overall sales (a la 80/20        rule), a larger group of products have middling sales, and most        products have low sales.    -   02_add_dynamic—starting from 00_base, the effects of the IV        Possible settings undergo multiple transitions at predetermined        times (unknown to the algorithms) such that the impact of IV        Possible settings is reversed. This changing behavior is very        typical of the real world. For example, effectiveness of        different advertising campaigns and techniques regularly changes        over space and time (what worked before isn't likely to work        now). Similarly, optimal control setting assignment choices on a        manufacturing line will vary due to factors like temperature,        humidity, and the nuances of specific equipment, e.g., wear and        tear.    -   03_add_subject_var_dynamic—a combination of 01_add_subject_var        and 02_add_dynamic The combination of these two behaviors        (described above) make this environment even more similar to        many dynamic real-world environments.    -   04_add_sine—starting from 00_base, adds an overall sinusoidal        pattern to the performance metric values. This simulates        periodic trends (e.g. seasonal, weekly) in the FOM that are        independent of the effects of the IV Possible settings. Some        algorithms have difficulty dealing with the additional data        variance. Periodic behavior of this type is very typical of the        real world. For example, retail sales, supply chains, and so on,        often follow weekly, monthly, and seasonal cycles that introduce        significant variance in performance metrics. As another example,        manufacturing and other processes that are impacted by seasonal        changes in weather may also experience similar effects. A key        challenge in these situations (that the described system        addresses) is to be able to differentiate the impact of        marketing activities (for example) from these underlying        behaviors.    -   05_add_subject_var_sine—a combination of 01_add_subject_var and        04_add_sine. The combination of these two behaviors (described        above) make this environment even more similar to complex and        dynamic real-world environments.    -   06_add_ev_effects—the optimal combination of IV Possible        settings is different for some procedural instances. This        variation in control setting assignment is very typical of        real-world situations. For example, different advertising or        promotional approaches will work better than others depending on        the products involved, recipients of content, space, time, etc.    -   10_complex—a combination of 01_add_subject_var, 02_add_dynamic,        04_add_sine, and 06_add_ev_effects. This environment does the        most to capture the behaviors of the real world in that it takes        all of the above described real-world behaviors and combines        them into one environment.

FIG. 14 shows the performance of the described system relative to theperformance of multiple other systems when controlling multipledifferent environments that have varied temporal effects.

In particular, each of the other systems uses a corresponding existingcontrol scheme to control multiple different environments.

The environments that are being controlled have 4 controllable elementseach with 2 possible settings and the performance metric values at eachiteration are drawn from a Gaussian distribution. The environments havevaried temporal delays and durations imposed that affect when theperformance metric values are produced relative to initial applicationof control settings for a given instance. For example, in the topenvironment, the environment responses for all effects are delayed by 2time iterations and last for 3 time iterations. In the followingenvironment, the 4 controllable elements all have different temporaldelays and durations. The third and fourth environments add additionalcomplexity and variability.

As can be seen from the example of FIG. 14, the described system is ableto perform at similar or better possible settings than the other controlschemes for each of the different environments. This showcases thedescribed system's ability to dynamically adapt to the temporal behaviorof the effects of applying control settings, i.e., by varying thetemporal extent parameters during operation.

In addition, two of the environments include underlying periodicbehavior unrelated to IV control setting assignment effects. Thisbehavior is typical of situations encountered in the real world (e.g.advertising, pharmaceuticals) where actions taken have a delayed ratherthan immediate effect. At the same time, such scenarios often have aresidual effect that lasts after control setting assignment isdiscontinued. In addition, it would be rare to find these temporalbehaviors in isolation. Rather, they will most often co-occur withunderlying behaviors similar do the sinusoidal pattern shown. As can beseen from FIG. 14, the described system outperforms the conventionalsystems, i.e., due to being better able to account for differenttemporal behaviors by adjusting the temporal extent parameters and otherinternal parameters to adjust to changes in underlying behaviors.

Details of the environments shown in FIG. 14 follow.

-   -   00_temporal—500 procedural instances; 4 controllable elements        with 2 possible settings each with the performance metric value        drawn from a Gaussian distribution. Selection of different IV        Possible settings changes the mean and/or standard deviation of        the distribution. Performance metric values for all effects are        delayed by 2 time iterations and last 3 iterations.    -   01_temporal_multi—same as 00_temporal except that the 4        controllable elements have different temporal delays and        durations.    -   02_temporal_sine—starting from 00_base with sinusoidal behavior        added    -   03_temporal_delay_only—same as 00_temporal but with durational        behavior removed    -   04_temporal_multi_delay_only—same as 01_temporal_multi but with        durational behavior removed    -   05_temporal_sine_delay_only—same as 02_temporal_sine but with        durational behavior removed

FIG. 15 shows the performance of the described system with and withoutClustering. The environment that is being controlled has 3 controllableelements each with 5 possible settings and the performance metric valueat each iteration is drawn from a Gaussian distribution which is fixedthroughout the experiment. The environment being controlled hasdifferent optimal control setting assignments (controllable elements)depending on the characteristics of the procedural instances/EUsdescribed by the Environment characteristics. One set of control settingassignments will produce good results overall but in fact be negativefor a sub-population. If the sub-population is given its specific idealcontrol setting assignment, the overall utility is improved. This istypical of real-world situations where optimal control settingassignment may vary greatly based on external characteristics. The leftfigure shows the performance of the described system with the clusteringcomponent included. In this case, the described system assigns specificcontrol setting assignment for procedural instances/EUs, which resultsin overall higher FOM. The right figure shows the performance of thedescribed system without using the clustering component, i.e., withoutever entering the clustering phase. In this case, the algorithm utilizesa single overall control setting assignment approach for all proceduralinstances, which causes it to use a non-optimal control settingassignment for a certain sub-population. As can be seen from FIG. 15,the described system performs better when clustering is used.

FIG. 16 shows the performance of the described system with the abilityto vary the data inclusion relative to the performance of the describedsystem controlling the same environment while holding the data inclusionwindow parameters fixed. In the example of FIG. 16, the environment thatis being controlled exhibits two gradual changes in the relative effectsof control settings on performance measures. This is typical of the realworld in two ways 1) the impact of actions (e.g. advertising,manufacturing parameters) is rarely if ever static, 2) when such changeshappen, they are often gradual in nature, not abrupt. The left figureshows the performance of the described system with the DIW componentincluded. In this case, the described system is able to rapidly detect,e.g., through hybrid-baseline comparisons, that the effects havechanged, and the described system can immediately re-learn the bestcontrol setting assignment by shrinking the data inclusion window. Theright figure shows the performance of the described system without usingthe DIW component. In this case, the algorithm adapts to the change intreat effects very gradually. By the time it does so, the effects arealready changing again.

FIG. 17 shows the performance of the described system with and withoutTemporal analysis, i.e., with the ability to vary the temporal extentand without. The environment that is being controlled has 4 controllableelements each with 2 possible settings and the performance metric valueat each iteration is drawn from a Gaussian distribution which is fixedthroughout the experiment. The environments have varied temporal delaysand carryover behavior imposed that affect when the performance metricvalues are produced relative to initial application of IV possiblesettings. In addition, two of the environments include underlyingperiodic behavior unrelated to effect. This behavior is typical ofsituations encountered in the real world (e.g. advertising,pharmaceuticals) in that very often actions taken do not have animmediate effect and they often have a residual effect even aftercontrol setting assignment is discontinued. And in addition, thistemporal variation is often present in the context of other underlyingbehavior. This figure illustrates the value of the temporal optimizationwithin the described system. The left column shows the performance ofthe described system using the temporal component. The right columnshows the performance of the described system without using the temporalcomponent. As can be seen from the example of FIG. 17, the describedsystem performs significantly better when temporal analysis is used whenthe environment has these temporal properties.

FIG. 18 shows the performance of the described system when controllingan environment relative to the performance of a system that controls thesame environment using an existing control scheme (“Lin UCB”). In theexample of FIG. 18, the environment that is being controlled has cyclicunderlying behavior unrelated to IV possible setting effects along withchanges in these effects such that the optimal control settingassignment changes over time. These characteristics are similar to thosefound in many real-world environments, where there are regularunderlying dynamics (e.g. weekly, monthly, or seasonal patterns) alongwith changes over time in the impact of control settingassignments/actions. FIG. 18 shows a subset of time during which, theimpacts of IV possible settings are changing in the underlyingenvironment (during iterations 200-250). As can be seen from FIG. 18,the performance of the existing control scheme stays in exploit phasebased on the previous control setting assignment effects and is unableto rapidly adapt to the change. On the other hand, the performance ofthe described system adapts to the changing effects quickly and findsincremental improvements under the altered environment effects (upperplots). This results in an increased incremental benefit from thedescribed system (lower plots). Notice that as time continues, thecumulative benefit of utilizing the described system will continue toincrease.

While the above description uses certain terms to refer to features ofthe described system or actions performed by the described system, itshould be understood that these are not the only terms that can be usedto describe the operation of the system. Some examples of alternativeterminology follow. As one example, the controllable elements mayinstead be referred to as independent variables (IVs). As anotherexample, the environment characteristics may instead be referred to asexternal variables (EVs). As another example, the environment responsesmay instead be referred to as dependent variable (DVs). As anotherexample, procedural instances may instead be referred to as experimentalunits or self-organized experimental units (SOEUs). As another example,the possible settings for a controllable element may instead be referredto as levels of the element (or of the IV). As yet another example,control settings may instead be referred to as process decisions andassigning control settings for a procedural instance may be referred toas treatment assignment.

The term “repeatedly,” i.e., in the context of repeatedly performing anoperation is generally used in this specification to mean that theoperation is occurring multiple times with or without a specificsequence. As an example, a process may constantly or iteratively followa set of steps in a specified order or the steps may be followedrandomly or non-sequentially. Additionally, steps may not all beexecuted with the same frequency, for example treatment assignment maybe executed more frequently than updating the causal learning, and thefrequency of the latter may change over time, for example as exploitphase becomes dominant and/or as computing capacity/speed requirementschange over time.

This specification uses the term “configured” in connection with systemsand computer program components. For a system of one or more computersto be configured to perform particular operations or actions means thatthe system has installed on it software, firmware, hardware, or acombination of them that in operation cause the system to perform theoperations or actions. For one or more computer programs to beconfigured to perform particular operations or actions means that theone or more programs include instructions that, when executed by dataprocessing apparatus, cause the apparatus to perform the operations oractions.

Embodiments of the subject matter and the functional operationsdescribed in this specification can be implemented in digital electroniccircuitry, in tangibly-embodied computer software or firmware, incomputer hardware, including the structures disclosed in thisspecification and their structural equivalents, or in combinations ofone or more of them. Embodiments of the subject matter described in thisspecification can be implemented as one or more computer programs. Theone or more computer programs can comprise one or more modules ofcomputer program instructions encoded on a tangible non transitorystorage medium for execution by, or to control the operation of, dataprocessing apparatus. The computer storage medium can be amachine-readable storage device, a machine-readable storage substrate, arandom or serial access memory device, or a combination of one or moreof them. Alternatively or in addition, the program instructions can beencoded on an artificially generated propagated signal, e.g., amachine-generated electrical, optical, or electromagnetic signal, thatis generated to encode information for transmission to suitable receiverapparatus for execution by a data processing apparatus.

The term “data processing apparatus” refers to data processing hardwareand encompasses all kinds of apparatus, devices, and machines forprocessing data, including by way of example a programmable processor, acomputer, or multiple processors or computers. The apparatus can alsobe, or further include, special purpose logic circuitry, e.g., an FPGA(field programmable gate array) or an ASIC (application specificintegrated circuit). The apparatus can optionally include, in additionto hardware, code that creates an execution environment for computerprograms, e.g., code that constitutes processor firmware, a protocolstack, a database management system, an operating system, or acombination of one or more of them.

A computer program, which may also be referred to or described as aprogram, software, a software application, an app, a module, a softwaremodule, a script, or code, can be written in any form of programminglanguage, including compiled or interpreted languages, or declarative orprocedural languages; and it can be deployed in any form, including as astand alone program or as a module, component, subroutine, or other unitsuitable for use in a computing environment. A program may, but neednot, correspond to a file in a file system. A program can be stored in aportion of a file that holds other programs or data, e.g., one or morescripts stored in a markup language document, in a single file dedicatedto the program in question, or in multiple coordinated files, e.g.,files that store one or more modules, sub programs, or portions of code.A computer program can be deployed to be executed on one computer or onmultiple computers that are located at one site or distributed acrossmultiple sites and interconnected by a data communication network.

In this specification, the term “database” is used broadly to refer toany collection of data: the data does not need to be structured in anyparticular way, or structured at all, and it can be stored on storagedevices in one or more locations. Thus, for example, the index databasecan include multiple collections of data, each of which may be organizedand accessed differently.

Similarly, in this specification the term “engine” is used broadly torefer to a software-based system, subsystem, or process that isprogrammed to perform one or more specific functions. Generally, anengine will be implemented as one or more software modules orcomponents, installed on one or more computers in one or more locations.In some cases, one or more computers will be dedicated to a particularengine; in other cases, multiple engines can be installed and running onthe same computer or computers.

The processes and logic flows described in this specification can beperformed by one or more programmable computers executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby special purpose logic circuitry, e.g., an FPGA or an ASIC, or by acombination of special purpose logic circuitry and one or moreprogrammed computers.

Computers suitable for the execution of a computer program can be basedon general or special purpose microprocessors or both, or any other kindof central processing unit. Generally, a central processing unit willreceive instructions and data from a read only memory or a random accessmemory or both. The essential elements of a computer are a centralprocessing unit for performing or executing instructions and one or morememory devices for storing instructions and data. The central processingunit and the memory can be supplemented by, or incorporated in, specialpurpose logic circuitry. Generally, a computer will also include, or beoperatively coupled to receive data from or transfer data to, or both,one or more mass storage devices for storing data, e.g., magnetic,magneto optical disks, or optical disks. However, a computer need nothave such devices. Moreover, a computer can be embedded in anotherdevice, e.g., a mobile telephone, a personal digital assistant (PDA), amobile audio or video player, a game console, a Global PositioningSystem (GPS) receiver, or a portable storage device, e.g., a universalserial bus (USB) flash drive, to name just a few.

Computer readable media suitable for storing computer programinstructions and data include all forms of non volatile memory, mediaand memory devices, including by way of example semiconductor memorydevices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks,e.g., internal hard disks or removable disks; magneto optical disks; andCD ROM and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input. In addition, a computer can interact with a user bysending documents to and receiving documents from a device that is usedby the user; for example, by sending web pages to a web browser on auser's device in response to requests received from the web browser.Also, a computer can interact with a user by sending text messages orother forms of message to a personal device, e.g., a smartphone that isrunning a messaging application, and receiving responsive messages fromthe user in return.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front end component, e.g., aclient computer having a graphical user interface, a web browser, or anapp through which a user can interact with an implementation of thesubject matter described in this specification, or any combination ofone or more such back end, middleware, or front end components. Thecomponents of the system can be interconnected by any form or medium ofdigital data communication, e.g., a communication network. Examples ofcommunication networks include a local area network (LAN) and a widearea network (WAN), e.g., the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other. In someembodiments, a server transmits data, e.g., an HTML page, to a userdevice, e.g., for purposes of displaying data to and receiving userinput from a user interacting with the device, which acts as a client.Data generated at the user device, e.g., a result of the userinteraction, can be received at the server from the device.

While this specification contains many specific implementation details,these should not be construed as limitations on the scope of anyinvention or on the scope of what may be claimed, but rather asdescriptions of features that may be specific to particular embodimentsof particular inventions. Certain features that are described in thisspecification in the context of separate embodiments can also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially be claimed assuch, one or more features from a claimed combination can in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings and recited inthe claims in a particular order, this should not be understood asrequiring that such operations be performed in the particular ordershown or in sequential order, or that all illustrated operations beperformed, to achieve desirable results.

In certain circumstances, multitasking and parallel processing may beadvantageous. Moreover, the separation of various system modules andcomponents in the embodiments described above should not be understoodas requiring such separation in all embodiments, and it should beunderstood that the described program components and systems cangenerally be integrated together in a single software product orpackaged into multiple software products.

Particular embodiments of the subject matter have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims can be performed in a different orderand still achieve desirable results. As one example, the processesdepicted in the accompanying figures do not necessarily require theparticular order shown, or sequential order, to achieve desirableresults. In some cases, multitasking and parallel processing may beadvantageous.

1. A method of controlling an environment by selecting control settingsthat include a respective setting for each of a plurality ofcontrollable elements of the environment, the method comprisingrepeatedly performing the following: repeatedly selecting, by a controlsystem for the environment, control settings for the environment basedon internal parameters of the control system, wherein: at least some ofthe control settings for the environment are selected based on a causalmodel that identifies, for each controllable element, causalrelationships between possible settings for the controllable element anda performance metric that measures a performance of the control systemin controlling the environment, and the internal parameters include afirst set of internal parameters that define a number of previouslyreceived performance metric values that are used to generate the causalmodel for a particular one the controllable elements; obtaining, foreach selected control setting, a performance metric value; determining,based on the measures of the performance metric values, that generatingthe causal model for the particular controllable element using adifferent number of previously received performance metric values wouldresult in higher system performance; and adjusting, based on thedetermining, the first set of internal parameters.
 2. The method ofclaim 1, wherein the causal model is a causal model for a particularcluster of one or more clusters of procedural instances, and whereineach of the procedural instances are segments of the environment forwhich control settings are selected.
 3. The method of claim 1 anypreceding claim, wherein the first set of internal parameters includes awindow value that specifies the number of previous instances.
 4. Themethod of claim 3, wherein the causal model includes a respectivedistribution around an impact measurement for each possible setting forthe controllable element and wherein determining, based on the measuresof the performance metric values, that generating the causal model forthe particular controllable element using a different number ofpreviously received performance metric values would result in highersystem performance comprises: updating the causal model based on themeasures of the performance metric values; and determining that theimpact measurements for the possible settings for the particular elementin the updated causal model are not normally distributed.
 5. The methodof claim 4, wherein adjusting, based on the determining, the first setof internal parameters comprises: adjusting the window value to specifya number of previously received instances for which the impactmeasurements for the possible settings for the particular element in theupdated causal model would be normally distributed.
 6. The method ofclaim 1, wherein the first set of internal parameters specifies a rangeof possible values for a window value that specifies the number ofpreviously received instances, and wherein the system samples windowvalues from the range of possible values to compute the causal model. 7.The method of claim 6, wherein: some of the control settings areselected based on baseline probabilities for the possible settings thatare independent of the causal model; and the method further comprises:maintaining a second causal model that measures causal effects betweenpossible values of the window value and a difference in systemperformance between (i) control settings selected using the causal modelwhen the causal model is generated based on the possible value of thewindow value and (ii) control settings selected based on baselineprobabilities.
 8. The method of claim 7, wherein determining thatgenerating the causal model for the particular controllable elementusing a different number of previously received performance metricvalues would result in higher system performance comprises: determiningthat the causal effects identified in the second causal model betweenpossible values of the window value have changed once the second causalmodel has been updated based on the measures of performance values. 9.The method of claim 8, wherein adjusting, based on the determining, thefirst set of internal parameters comprises: mapping the causal effectsidentified in the updated second causal model to updated probabilitiesfor the possible window values; and using the updated probabilities tosample window values.
 10. The method of claim 7, wherein determiningthat generating the causal model for the particular controllable elementusing a different number of previously received performance metricvalues would result in higher system performance comprises: determiningthat a value outside of the range of window values is likely to improvesystem performance.
 11. The method of claim 10, wherein adjusting, basedon the determining, the first set of internal parameters comprises:adjusting the range of possible window values.
 12. The method of claim10, wherein determining that a value outside of the range of windowvalues is likely to improve system performance comprises: determining ameasure of stability of the causal effects in the causal model overtime.
 13. A system comprising one or more computers and one or morestorage devices storing instructions that when executed by the one ormore computers cause the one or more computers to perform the operationsof the method of claim
 1. 14. One or more computer-readable storagemedia storing instructions that when executed by one or more computerscause the one or more computers to perform the operations of the methodof claim 1.